Time Highlighted Multi-Interest Network for Sequential Recommendation

2023-10-26 13:15:00JiayiMaTianhaoSunandXiaodongZhang

Computers Materials&Continua 2023年9期

Jiayi Ma,Tianhao Sunand Xiaodong Zhang

College of Computer Science,Chongqing University,Chongqing,400044,China

ABSTRACT Sequential recommendation based on a multi-interest framework aims to analyze different aspects of interest based on historical interactions and generate predictions of a user’s potential interest in a list of items.Most existing methods only focus on what are the multiple interests behind interactions but neglect the evolution of user interests over time.To explore the impact of temporal dynamics on interest extraction,this paper explicitly models the timestamp with a multi-interest network and proposes a time-highlighted network to learn user preferences,which considers not only the interests at different moments but also the possible trends of interest over time.More specifically,the time intervals between historical interactions and prediction moments are first mapped to vectors.Meanwhile,a time-attentive aggregation layer is designed to capture the trends of items in the sequence over time,where the time intervals are seen as additional information to distinguish the importance of different neighbors.Then,the learned items’transition trends are aggregated with the items themselves by a gated unit.Finally,a self-attention network is deployed to capture multiple interests with the obtained temporal information vectors.Extensive experiments are carried out based on three real-world datasets and the results convincingly establish the superiority of the proposed method over other state-of-the-art baselines in terms of model performance.

KEYWORDS Recommender system;temporal dynamics;multi-interest network;trends;attention mechanism

1 Introduction

Recommender systems explore users’potential interests based on historical interactions and then analyze the correlation between interests and items to provide personalized recommendations.Collaborative filtering [1] is a classic algorithm of recommender systems,and it works based on the fact that similar users may share similar preferences and similar items may be liked by users.Many traditional methods [2–4] are designed based on it to alleviate the problem of data sparsity.With the advent of deep learning,neural network-based approaches[5]are proposed for their capacity for representation.For example,Guo et al.[6] combined deep learning and factorization machines to learn both high-order and low-order feature interactions.Wang et al.[7]captured richer embedding representations based on high-order connectivity in the user-item graph.However,the above methods learn fixed user embeddings,which cannot capture dynamic demands as user preferences always change over time.

Sequential recommendation treats historical interactions as a chronological sequence that contains the evolution of user preferences to guarantee more accurate,customized,and dynamic recommendations.The early research on the sequential recommendation predominantly relied on Markov Chains (MC) [8],which made the assumption that the dependency of each interaction lies in its preceding sequence.Recent methods [9,10] have achieved satisfactory performance by converting information into low-dimensional embeddings and using neural networks to learn user representations.For more effective and interpretable recommendations,some methods[11–13]utilized auxiliary relations in knowledge graph to capture more connections between items to enrich item embedding,rather than being limited to a sequence.In addition,the idea of self-supervised[14]which adaptively adjusts parameters by the difference between the current optimal solution and the global optimal solution has been introduced into recommendation systems,several models [15,16] extracted user embeddings from multiple perspectives and make the distance between them closer to obtain more accurate representation of users and improve model performance.

To better match the real-world recommendation scenarios,multi-interest-based models [17,18]have been proposed to extract multiple interests that represent different aspects of user preferences.Following the idea,Tan et al.[19]constructed intent prototypes and assign different weights to obtain various interests.Chen et al.[20]explored the periodicity and interactivity of user behavior sequences to enrich item embedding and utilized an attention network to extract multiple interests.Nevertheless,these methods have the following two problems: (1) they only answer the question“What are the interests of users”without distinguishing the importance of different interests in predicted time;(2)the transition trends of items in sequences that can simulate the potential interest of users over time are not fully mined.This work argues that temporal information is a key factor in extracting user interests,and transition trends can reflect possible points of interest.As shown in Fig.1,the user interacted with three categories of products in the last week: laptops,clothes,and desks.Previous models tend to treat each interest equally,resulting in content that users are not interested in still being recommended.By contrast,this work considers timestamps when extracting interests,pays more attention to recent interests,and recommends items that the user may be like based on the trends.Besides,Fig.2a shows the distribution of time intervals between two adjacent items on Amazon Books,it can be seen that the timespan in the historical sequences might be large.Traditional approaches treat user interests(items marked with black,blue,and green series)equally,but the time gap between the interactions related to the black series and the last interaction exceeds 100 days,the user(whose most recent interests are blue and green series)is unlikely to be interested in it,suggesting the possibility of utilizing temporal information.Fig.2b plots the behavior analysis of a user’s last thirty interactions,revealing that interactions tend to be grouped within a short period of time,while the items that are interacted within a short period of time often have correlations.So,the transformation of items in the sequence can reflect the dynamic changes of points of interest to a certain extent.

Figure 1:An example of sequential recommendation using temporal information.Two more recent interests“clothes&desks”and possible trends of them are mainly considered when predicting the top four items

Figure 2:Analysis of temporal information about user interactions on Amazon Books.(a) Is the distribution of time intervals between two consecutive interactions (the long tail of the distribution is not included).(b)Is the behavior analysis of a user’s last thirty interactions.The X-axis denotes the time since the first interaction,while the Y-axis denotes the interaction count.Items with different categories are represented by circles of different colors

Aiming at the problems that the existing multi-interest methods cannot capture the temporal dynamics of user preferences,and do not make full use of interaction history to simulate the changing trends of user interests.This paper proposes a novel method calledtime highlightedmultiinterest network for sequentialrecommendation(TimiRec),which assigns different weights to multiple interests based on the prediction moment,and aggregates the updated neighbor item representations as the transition trends of the current item.The main contributions of this work are summarized as follows:

? Based on the prediction moment,a linear time interval function is designed to generate time interval information as the key factor to extract multiple interests from the user’s behavior sequence.

? A time-attentive aggregation layer is introduced to aggregate neighbor items,in which an attention network is used to update the representations of neighbor items by considering the time intervals between the item and its neighbors,thereby capturing the changing trends of points of interest.And a gated unit is deployed to adaptively fuse the initial items embeddings and the captured trends.

? The effectiveness of the proposed algorithm is evaluated by comparing it with various baseline methods on three real-world datasets,and the results convincingly establish the superior performance of the proposed model over state-of-the-art baselines.

2 Related Work

2.1 Sequential Recommendation

Sequential recommender systems treat users’historical behaviors in chronological order and model the dependencies between items to provide more accurate recommendations.The model based on Markov Chains[8,21]is one of the most classical models.Nevertheless,a significant shortcoming of MC-based methods is their limited consideration of long-term dependencies,since they only rely on the most recent interactions.The field of sequential recommendation has embraced the advancements in deep learning,incorporating Recurrent Neural Network (RNN) and its variants,such as Long Short-Term Memory(LSTM)and Gated Recurrent Unit(GRU).Zhou et al.[22]used GRU to increase the accuracy of prediction by supervising the hidden state.However,RNN-based methods that utilize the current state and previous state as input still have several problems,such as difficulty in parallelization and learning long-term dependencies.To tackle these problems,inspired by Transformer[23],Kang et al.[10]stacked self-attention layers to capture item relevance.Moreover,recent studies focused on incorporating interaction timestamps into the sequential modeling process.For instance,Li et al.[24]added relative time interval and position information into item embeddings.Following a normal distribution,Wang et al.[11]introduced two distinct temporal kernel functions to explicitly model the evolution of user preferences over time in terms of“complement”and“substitute”relations.Jiang et al.[25]designed a time weighting function to enhance the influence of the time effect of evaluation.

2.2 Attention Mechanism

Attention mechanism is a technique that considers the importance of each item in the input sequence to the output,recognizing that not all items in the sequence are equally important.Chen et al.[26] introduced attention mechanism into recommender systems as an additional component earlier.Xiao et al.[27] combined attention networks and Factorization Machine (FM) to improve the performance and interpretability of the model.Wang et al.[28] learned an attentive transactional context embedding which paid more attention to relevant items.And Cai et al.[29]measured the importance of different friends in social networks based on attention mechanism.More Recently,Vaswani et al.[23]proposed a sequence-to-sequence method named Transformer with a pure attention mechanism,which surpasses Convolutional Neural Network (CNN)/RNN-based approaches and achieves state-of-the-art performance.Unlike sequentially propagating sequence information,Transformer introduces the concept of query,key,and value to capture the relationship between items in the sequence,and then updates each item based on the similarity.This enables the model to simultaneously focus on all relevant parts of the sequence,which allows for better long-term dependencies modeling and improving the overall performance of the model.

3 Methodology

In this section,we begin by formulating the task of sequential recommendation and subsequently present the approach to map time intervals to corresponding vectors and construct the neighbor-aware graph,as well as the details of the proposed framework.

3.1 Problem Formulation

3.2 Linear Time Interval

To capture temporal dynamics,a simple way is to use a time decay function [11,30],but its disadvantages are that the fitting capacity is limited and the importance weights are not normalized.Another way[24,31]is to map timestamps to vectors,although it can improve model performance,its calculation of relative time intervals is overly dependent on the minimum value.

Besides,note that in Fig.2b,interactions tend to be grouped in a short time,indicating the importance of ensuring the small range of time intervals are treated equally.Taking advantage of the above two approaches,this paper designs a linear time interval function to model the effect of temporal information.Specifically,for a given anonymous time sequenceT=[t1,t2,...,t|T|],the linear time interval between interaction at timetand recommendation moment is defined as follows:

wheretrrepresents the timestamp of the target item,andαis the coefficient of the linear function.By adjusting the value ofα,ensure that the time interval is the same within the range ofα.And then the linear time interval sequence is transformed intoI=[i1,i2,...,i|T|]with a clip operation to avoid sparse relation encoding:

wheremis the threshold.

3.3 Neighbor-Aware Graph Construction

Generally,adjacent interactions in sequence are often related to similar interests.To capture the changing trends of items and enrich the representation of itemxi,we attentively aggregate the embedding of its neighbors(i.e.,δ-neighbor set) in graphG,which is generated based on the pairwise item transitions in interaction records of all users.Since the time interval between paired items can indicate their correlation,utilizing it to distinguish the importance of different neighbors is necessary.Note that the same pairwise item transition may occur at multiple different time intervals,so choose the largest one to cover all possible situations and ensure the comprehensiveness of the model.

3.4 TimiRec Framework

Fig.3 provides an overview of our proposed framework,TimiRec.Each part of the model will be described in detail next.

Figure 3:The architecture of TimiRec.The linear function is applied to time interval information.A neighbor-aware construct and a time-attentive aggregation layer are deployed to sample top-H collaborative neighbors and model the trends of items over time,respectively.Finally,the multi-interest extraction layer is introduced to generate diverse preferences from the output of the gated fusion unit

3.4.1 Embedding Layer

The interaction sequence [x1,x2,...,x|X|] is converted into a fixed-length sequenceX=[x1,x2,...,xL],whereLrepresents the maximum length.If the sequence is longer thanL,only the most recentLitems are kept,and zero pad the sequence with zero on the left of the sequence if it is shorter thanL.The time interval sequence[i1,i2,...,i|T|]is also transformed intoI=[i1,i2,...,iL]to maintain the corresponding time interval of each interaction.

EX∈R|V|?dis the learnable embedding matrix for all items,wheredrepresents the latent dimension.Then,the embedding of the behavior sequenceX∈RL?dis obtained by a lookup operation.Similar to the behavior sequence,another embedding matrixEI∈Rm?dis created for linear time interval sequenceI,wheremrepresents the maximum number of time intervals.After retrieval,the time interval sequence embeddingI∈RL?dis obtained.

3.4.2 Time-Attentive Neighbor Relation Aggregation

3.4.3 Multi-Interest Extraction Layer

Considering the impact of temporal dynamics,the temporal embeddingI∈RL?dis used as the query of attention mechanism to selectively aggregate behavior sequences:

whereW3∈Rd?4dandW4∈R4d?Kare learnable parameters.A∈RL?Kestimates the contribution of each item in the sequence to the multiple interests,S∈RK?dindicates multiple interests andKis the number of interests.

Figure 4:Illustration of the time-attentive aggregation layer,which aggregates the representation of neighbor items by considering time interval information

3.4.4 Model Training

After obtaining the interest embeddings from the multi-interest extraction layer,for the target itemv,the argmax operator is used to select the corresponding interest vector fromKcandidate interest representations as user embedding:

When provided with a training sample(u,v)containing the user embeddingsand item embeddingev,the probability of useruwill interact with itemvis calculated as follows:

Since the sum operator in Eq.(11)is computationally time-consuming,a sampled softmax method is introduced to minimize the following objective function:

3.4.5 Model Testing

Different from the training phase,different interests representing different aspects of user preferences can independently provide top-Nitems in the testing phase.To get the final top-Nprediction results fromK?Ncandidate items,a straightforward approach is to select those items with the greatest similarity as the final prediction based on the inner product of candidate items and user interests,and it’s defined as:

whereevis the embedding of a candidate item,andskindicates thek-th interest.

4 Experiments

In this section,to validate the effectiveness of the proposed framework,extensive experiments with other state-of-the-art baseline methods are carried out on three real-world datasets.

4.1 Experimental Settings

4.1.1 Datasets

TimiRec is evaluated on three public datasets of diverse domains and sizes,Table 1 presents the statistics information of these datasets.

Table 1:The dataset statistics

? Amazon1http://jmcauley.ucsd.edu/data/amazon/.: A commonly used review dataset that comprises various sub-datasets,and the following two specific sub-datasets are used:Books and Beauty.

? MMTD2http://www.cp.jku.at/datasets/MMTD/.:Million Musical Tweets Dataset(MMTD)[32]is a dataset of listening events collected from Twitter.

To ensure data quality,interactions involving users and items with less than 5 occurrences are filtered out,and all users are split into training/validation/test sets in a ratio of 8:1:1.For model training,the complete sequences of interactions from the training users are utilized.More specifically,for a user sequenceXu=,each training sampleuses the firsttbehaviors to predict the(t+1)-thinteraction,wheret=4,5,...,|Xu|-1.The number of neighborsHis 20,and the distanceδbetween the item and neighbors is set to 2.Each training sample is truncated to 20.For model testing,the first 80% of interactions of the user sequence from validation and test users are used as input of the trained model to infer user embeddings,and metrics are calculated by predicting the remaining 20% of interactions.

4.1.2 Evaluation Metrics

To evaluate the performance of the proposed TimiRec,three widely adopted evaluation criteria for top-N recommendation are used in our experiments,i.e.,Recall,Normalized Discounted Cumulative Gain (NDCG),and Hit Ratio (HR).Among them,Recall@N represents the proportion of ground truth items included in the recommended N candidates,NDCG@N is a position-aware metric that assigns higher scores to ground truth items appearing at higher positions in the recommendation list,and HR@N focuses on determining whether the ground-truth item is present among the recommended items.N is set to 20 and 50 in our experiments.

4.1.3 Baselines

Comparative evaluations are conducted with TimiRec and the following methods:

?YouTube DNN[33]:it is a deep learning model designed for an industrial recommendation that applies deep neural networks to YouTube video recommendation(YouTube DNN).

?GRU4Rec[9]:it first applies GRU for the sequential recommendation.

?MIND[17]:it is a multi-interest model that incorporates a capsule network to extract multiple user interests.

?ComiRec-DR[18]:it follows MIND that extracts multiple interests using dynamic routing,and considers both diversity and accuracy of recommendation with a controllable factor.

?ComiRec-SA[18]: another variant of ComiRec that utilizes self-attention to extract diverse interests.

?PIMI[20]:a state-of-the-art model based on ComiRec-SA,periodicity and interactivity of user behavior sequence are explored to collect features before extracting multiple interests.

4.1.4 Details

TimiRec is implemented with TensorFlow.The embedding dimension is set to 64.For Books,MMTD,and Beauty,the reciprocal of the coefficient 1/αare set to 1 day,1 min,and 1 day,the number of time interval thresholdsm/nare set to 32/8,128/16,and 64/16,respectively,and the number of interest embedding is 4.The number of samples for sampled softmax loss is set to 10.The maximum number of training iterations is set to 1 million.A widely used optimizer Adam [34] is adopted for optimization with a learning ratelr=0.001.

4.2 Overall Performance

Table 2 shows a comprehensive summary of the performance of various methods across different datasets.Compared with GRU4Rec and YouTube DNN,which represent users as a single vector,multi-interest methods MIND,ComiRec,and PIMI achieve significant improvement,which implies that extracting multiple interests is more consistent with real-world scenarios.By incorporating the concepts of periodicity and interactivity in user behavior sequence,PIMI obtains notable enhancements compared to ComiRec.Our proposed method,TimiRec,consistently outperforms other baseline methods by effectively capturing the temporal dynamics of user interests.

4.3 Ablation Study

Table 3 presents the comparison results under the evaluation metrics of Recall@20,NDCG@20,and HR@20,where TimiRec-time and TimiRec-neigh denote TimiRec without temporal information and neighbor aggregate unit respectively.It can be found that removing any part harms results,suggesting that both of them are important to capture sequential information.Besides,lacking temporal information will lead to bigger performance loss.This is because the time interval information directly affects the weight of the current item when extracting interests,while neighbors are aggregated into items to enrich the item representation.

Table 2:Performance comparison of TimiRec and other baselines(%).In each row,the best performance is bolded,and the second best is underlined

4.4 Time Interval Function

To verify the effectiveness of the designed linear time interval function,we removed the neighboraggregation layer(i.e.,TimiRec-neigh in Section 4.3),and modify the Eq.(1)as:

Eq.(14)represents the logarithmic function,β is a coefficient,which assumes the effect of time on user interests gradually slows down as the time interval increases.Eq.(15)is the processing method in the paper proposed by Li et al.[24],min(iu)means the minimum time interval of useru.Because each training sample relies on the firsttbehaviors to predict the(t+1)-th interaction,leading to the value ofmin(iu)may change,so the relation between items will also change.Table 4 illustrates the performance when the time function changes.Experimental results demonstrate the great improvement brought by the proposed method.

Table 4:The performance of the time interval function(%)

4.5 Study on Hyper-Parameters

In order to gain a deeper understanding of how different hyper-parameters impact the performance of the model,the effect of time interval threshold m and n,the coefficient of time interval functionα,and the number of interestsKare studied.

As shown in Tables 5 and 6,the values of time interval thresholdsmandnare varied to investigate the effect of modeling temporal dynamics.The parametermcontrols the maximum time scope that directly affects user interests,andnensures the quality of aggregated neighbor items.The values ofmandnare selected from{16,32,64,128}and{2,4,8,16},respectively.Experimental results on Amazon Books show that model achieves the best performance whenm/nis set to 32/8.An excessively large time interval threshold may result in sparse encoding,conversely,setting a time interval threshold that is too small may lead to insufficient learning.

Table 5:Effect of threshold m(%)

Fig.5 illustrates the comparison of model performance on Amazon Books across different values of the coefficients α.The X-axis indicates 1/α,which is chosen from{0.25 day,0.5 days,1 day,1.5 days,2 days}.As is observed,TimiRec has the best performance when 1/α is 1 day.Combined with the best setting of 32 form,it can be further inferred that a person reads a book in about one month.

Figure 5:Performance comparison for the number of coefficients α on Amazon Books

Fig.6 presents the Metrics@20 and Metrics@50 results,demonstrating the effect of the interest numberKon Amazon Books.TimiRec obtains the best performance whenKis 4.Increasing the number of interests does not always improve the model effect,which is in line with real-world recommendation scenarios,where users usually do not have too many or too few interests.

Figure 6:Effect of the number of interest K on Amazon Books

4.6 Training Efficiency

As shown in Fig.7,the metrics of Recall@20 are tested on Amazon Books during the training process for the proposed model and two other state-of-the-art methods to show training efficiency compared to the proposed model.It can be observed that the evaluation metric of Recall@20 has roughly the same trend with iteration on three models.In terms of the average time per iteration,TimiRec takes an average of 0.050 s,which is 1.79 times larger than ComiRec-SA,which attributes to the aggregation of neighbor information and the preprocessing of time interval information.But compared with PIMI with an average iteration time of 0.100 s,the training efficiency of our model has been greatly improved,this is because the computation of the stacked three-layer self-attention network in the interactivity module is very time-consuming.

Figure 7:Training efficiency on Amazon Books

4.7 Case Study

The attention weights among multiple interests and items in the input sequence are visualized,which demonstrates the advantage of the proposed method by comparison with ComiRec-SA.Fig.8 illustrates the heatmap of the attention weights(corresponding to the value ofAin Eq.(9))associated with a user randomly selected from Amazon Books.Compared with ComiRec-SA which is not aware of time,it can be seen that for items under the same interest,TimiRec will assign higher weights to the more recently interacted items.And for items of the same category with smaller timespans,TimiRec will assign similar weights(Interest 2).

Figure 8:Heatmap of attention weights among multiple interests and input items. I represents the linear time interval sequence corresponding to the input sequence of a user from Amazon Books

5 Conclusion

This work proposes a novel framework named TimiRec,which utilizes temporal information to extract multiple user interests.Specifically,multiple interests of users are generated by highlighting the time intervals in the multi-interest extraction layer,and combined with the neighbor-aware aggregation unit to capture possible trends of points of interest.The effectiveness and efficiency of the proposed model have been empirically verified through experiments conducted on three real-world datasets.In future work,we will combine temporal information and knowledge graph to build bridges between items and further explore their relationships to capture the possible trend of interests more comprehensively.

Acknowledgement:The resources and computing environment are provided by Chongqing University,Chongqing,China.We are thankful for their support.

Funding Statement:This work is supported in part by the National Natural Science Foundation of China under Grant 61702060.

Author Contributions:The authors confirm contribution to the paper as follows: study conception and design,analysis and interpretation of result: Jiayi Ma;draft manuscript preparation: Jiayi Ma,Tianhao Sun;data collection: Jiayi Ma,Xiaodong Zhang;All authors reviewed the results and approved the final version of the manuscript.

Availability of Data and Materials:The authors have shared the link to the data in the paper.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

Computers Materials&Continua2023年9期

Computers Materials&Continua的其它文章: A Smart Obfuscation Approach to Protect Software in Cloud; Detection of Different Stages of Alzheimer’s Disease Using CNN Classifier; Research on Multi-Blockchain Electronic Archives Sharing Model; A Hybrid Deep Learning Approach to Classify the Plant Leaf Species; A Secure and Efficient Information Authentication Scheme for E-Healthcare System; An Intelligent Secure Adversarial Examples Detection Scheme in Heterogeneous Complex Environments