Emotions and Activity Profiles of Influential Users in Product Reviews Communities

Tanase, Dorian; Garcia, David; Garas, Antonios; Schweitzer, Frank

doi:10.3389/fphy.2015.00087

ORIGINAL RESEARCH article

Front. Phys., 17 November 2015
Sec. Interdisciplinary Physics
Volume 3 - 2015 | https://doi.org/10.3389/fphy.2015.00087

Emotions and Activity Profiles of Influential Users in Product Reviews Communities

Dorian Tanase

David Garcia^*

Antonios Garas

Frank Schweitzer

Chair of Systems Design, ETH Zurich, Zurich, Switzerland

Viral marketing seeks to maximize the spread of a campaign through an online social network, often targeting influential nodes with high centrality. In this article, we analyze behavioral aspects of influential users in trust-based product reviews communities, quantifying emotional expression, helpfulness, and user activity level. We focus on two independent product review communities, Dooyoo and Epinions, in which users can write product reviews and define trust links to filter product recommendations. Following the patterns of social contagion processes, we measure user social influence by means of the k-shell decomposition of trust networks. For each of these users, we apply sentiment analysis to extract their extent of positive, negative, and neutral emotional expression. In addition, we quantify the level of feedback they received in their reviews, the length of their contributions, and their level of activity over their lifetime in the community. We find that users of both communities exhibit a large heterogeneity of social influence, and that helpfulness votes and age are significantly better predictors of the influence of an individual than sentiment. The most active of the analyzed communities shows a particular structure, in which the inner core of users is qualitatively different from its periphery in terms of a stronger positive and negative emotional expression. These results suggest that both objective and subjective aspects of reviews are relevant to the communication of subjective experience.

1. Introduction

Popularity of socially-powered online platforms increased so much during the last years that, if we could imagine a country with a population as large as the user-base in Facebook, then it would be ranked as world's second largest country, with more than 1.23 Billion active users at the end of 2013 [1]. Users interact online via different platforms for personal blogging, dating, online shopping, reviewing products, etc. The latter two kind of platforms use their massive user community to both collect and disseminate information: Users create and discover reviews, form opinions based on the experience of others, and ultimately make the informed decision of buying a product or not. This form of socially-powered platforms are usually referred to as Social Recommender Systems (SRS) [2].

Similar to real-world social interactions, in online SRS platforms, some users manage to distinguish themselves from the rest by acquiring fame and social influence. If seen from a graph's perspective, some nodes become more central than others, but how this process works is not clear for real and online networks alike. How can a user increase its social influence and visibility? Are there any similarities in the career path of successful users? In this article, we address these questions by performing an empirical analysis on two datasets of online SRS that contain both product reviews and explicit social networks. Information is transferred in these systems through social ties, by means of social recommender filtering, which selects products and reviews from the peers that a user trusts. This functionality creates a spreading process through the social network that offers opportunities for viral marketing [3], using the social capital of online communities to maximize the visibility of a product [4].

The emotional content in product reviews is an interesting resource not only to overcome the bias present in ratings, but for the role emotions play in human communication and product evaluation. Studies in social psychology show that people find emotional information more interesting than the non-emotional, and that they show more engagement with emotional narrators [5]. Additionally, the social link between narrator and listener has been observed to strengthen when emotions are involved [6]. We are interested in testing these social theories, and assess whether they hold also in online recommender systems: Does a user who shares its emotions have a larger impact in the community? Do users prefer neutral product evaluations or, on contrary, is the personal experience, as emotional as it can be, considered more valuable?

In the theory of core affect [7], emotions are partially conscious, short-lived internal states, as opposed to the nature of opinions. A reviewer might not be fully aware of its own emotions, and if asked a long time after making the review, these emotions would have relaxed or disappeared, while its opinion about a product would remain. There is an expected overlap between rating and emotional classification [8], but the properties and social dynamics of opinions and emotions differ. For example, disclosure of emotions has been shown to be a better predictor for social connection than the sharing of facts and information [9], and collective emotions pose additional questions regarding collective identity, social action, and emergent phenomena in human societies [10].

The topic of social influence and spreading processes in social networks has attracted increasing attention, due to the presence of frequent cascades and viral phenomena in social systems. Influence processes have been studied in the context of rumor spreading in social networks [11]. To identify social influence, traditional measures focused on the concept of centrality [12], often measuring it as degree or betweenness centrality [13]. Recent works have shown that coreness centrality [14, 15] outperforms degree and betweenness centrality in detecting influentials both data-driven simulations [16] leading to applications to political movements [17, 18], scientific rumors [19, 20], gender inequality in Wikipedia [21], and cascades of users leaving a social network [22].

Finding influentials is often motivated by viral marketing, aiming at the maximization of the reach of a marketing campaign and user adoption [4, 23, 24]. Beyond purchase decisions, users of social recommender systems create star ratings and write reviews that can influence product adoption. The straightforward manner to analyze these reviews is to take into account the star rating as a measure of consumer satisfaction. This approach has been proved useful in the field of recommender systems [2, 25]. On the other hand, self-selection biases difficult the analysis of star-rating distributions, as their high bias reduces the heterogeneity of user evaluations, following a J-shaped distribution [26].

The large amount of product reviews in a social recommender system produce a state of information overload [25]. This kind of information overload influences the priority processing patterns of individuals [27]. Works in psychology identify emotions as one of the mechanisms for priority assignment: while we seek for positive experiences, negative ones make us react faster [28]. This leads to a stronger influence of emotions in social sharing [29], which also appears in product reviews [8]. Emotional expression cascades through social interaction have been identified in the context of chatrooms [30] and political movements [18], as well as for experimental [31] and field studies in social psychology [32]. Furthermore, pieces of information are more likely to be shared in a social context when they contain a stronger emotional content, as it has been shown for the case of urban legends [33].

Sentiment analysis tools allow researchers to process and analyze emotions in large scale datasets. Different techniques can be used to extract emotional content from short, informal texts [34, 35], being SentiStrength one of the leading tools for sentiment analysis in this context [36, 37]. Product reviews are much longer and better composed than tweets or YouTube comments, calling for the application of established lexicon-based techniques based on human annotation of words [35, 38]. These techniques have been proved useful to reveal patterns of depressive moods [39] and analyze the dynamics of happiness of whole societies [38]. We chose to apply this kind of lexicon-based sentiment analysis tool, due to its previous validation with large, formal texts, and for its possibility for extension to other languages [40].

To explore the role of emotions and activity into the social influence of users of product reviews communities, we empirically quantify user behavior in various aspects. First, we analyze the trust network of two independent online communities, measuring social influence in relation to spreading processes in social networks [41]. We compute the coreness centrality of all users [14], and validate that it serves as an indicator of the spreading potential of users. Second, we measure emotions in product reviews by means of sentiment analysis, and aggregated these values into emotional expression profiles of each users. Combining this subjective information with other objective dimensions, such as age in the community and review votes, we create extended user profiles with rich behavioral information. Third, we analyze the signatures of emotional expression across the different centrality values of each network, testing the existence of patterns of emotional expression.

2. Materials and Methods

2.1. Product Reviews Communities Data

We base our empirical analysis on two independent datasets based on two trust-based product reviews communities: Dooyoo¹ and Epinions². Dooyoo claims to be a “social-shopping platform which helps consumers make informed purchasing decisions”³. Similarly, Epinions is a product comparison website which features product reviews with a social component [42]. Both platforms are intended for English-speaking users, and allow them to post written reviews about products with a star-rating from 1 to 5. A particularly interesting feature of these two communities is that both allow the creation of directed social links that can be defined as trust and distrust links toward other users. Distrust links are not publicly available on the website, and for that reason our study is restricted only to trust links. These links are directional, meaning that the origin of the link trusts the destination of the link, as a way to acknowledge the quality of the reviews of the trusted user. The motivation for the creation of these links is advertised in both platforms as a way to improve product recommendations, as their recommender systems would refine the way they filter information based on this explicit trust [25].

Both platforms are product-generic, in the sense that users can review products in multiple categories, not limited to books or software. Apart from reviewing and creating trust links, users can also provide feedback about the quality of product reviews written by other users. This evaluation is done by clicking a helpful/unhelpful button, which the website uses to measure the helpfulness of a review as the aggregation of the votes of all users. This feedback feature is precisely relevant in Dooyoo, where users have the possibility of receiving money from the website as a reward for the creation of useful reviews⁴. In both communities, each review has a helpfulness score summarized as Very helpful, Somewhat helpful, Helpful, Not helpful, or No feedback if the review did not receive positive nor negative votes.

In our network datasets, nodes represent users, and a directed link from user u₁ to user u₂ means that u₁ explicitly trusts u₂. In both communities, users are allowed to see all the reviews created by all the other users, i.e., there are no private reviews. This means that there is a global information flow between users, which does not necessarily depend on the trust network. On the other hand, both websites advertise that their recommender systems take into account trust links in order to personalize recommendations. This implies that the trust network exercises a “filtering influence,” increasing the visibility and impact of the reviews of user u₂ for user u₁, if u₁ trusts u₂. This opens the question of the role of the trust network, especially when users are allowed to see all the reviews and can vote any review, regardless of the trust network, as helpful or unhelpful.

For Dooyoo, we gather a dataset which we refer to as the DY dataset. Datasets on Epinions are available from previous work [42], but to the best of our knowledge, none of them used the text of the reviews for extracting additional information beyond ratings. Therefore, we performed a web crawl on Epinions and fetched, besides the trust network, the text of reviews. The raw data was further cleaned up, by removing duplicate reviews, users, etc. We will refer to this dataset as the EP dataset. This second dataset is smaller, in terms of number of users, number of trust links and number of reviews than the version used in Walter et al. [25], but contains richer information including reviews text and helpfulness feedback. As shown in Table 1, the DY dataset contains roughly half the number of users in comparison to EP dataset, however, the amount of users that contributed at least one review is roughly the same. More details on the distributions of lifetimes and activity levels can be found in the Supplementary Information.

TABLE 1

Table 1. Descriptive statistics on the Dooyoo and Epinions datasets.

2.2. User Sentiment Analysis

The star-rating of a review provides the explicit opinion given by the user, but the emotional content is not acknowledged when making the review, contrary to other communities like Livejournal [43]. For this reason, we apply a sentiment analysis technique that extracts an estimation of the valence v, which represents the amount of pleasure or displeasure associated with an emotional experience [44]. Among other dimensions that can be used to measure emotions [45], valence is the one that explains the most variance of emotional experience [46, 47]. This technique analyzes each word in the review by looking into a lexicon on word valence, providing an estimation of v as the mean valence of the words appearing in the text (for more details see Supplementary Information). Then, this value of valence is compared with the baseline distribution of the valence for emotional words in generalized text, as estimated from a large dataset from web crawls [40]. If the valence of a review r is above a threshold given this baseline distribution, the review is classified as positive (e_r = 1), if it is below another threshold, it is classified as negative (e_r = −1), and if it is between both it is classified as neutral (e_r = 0).

Given the emotional classification of each review, we calculate the degree of positivity, negativity, and neutrality of every user, by aggregating its emotional scores over the whole number of reviews it contributed in the following way:

\begin{matrix} \begin{array}{l} P_{u} = \frac{1}{| R_{u} |} \sum_{r \in R_{u}} Θ [e_{r} = 1] N_{u} = \frac{1}{| R_{u} |} \sum_{r \in R_{u}} Θ [e_{r} = - 1] \\ U_{u} = \frac{1}{| R_{u} |} \sum_{r \in R_{u}} Θ [e_{r} = 0] \end{array} & (1) \end{matrix}

where R_u is the set of reviews written by the user u, |R_u| is the number of reviews created by u, which is a metric for the amount of information it contributes to the community, and Θ(x) is a Boolean function that returns +1 if the argument is true and zero otherwise. These three metrics contain additional information about user behavior that is not contained in the average star-rating of a user.

Intuitively, one could expect that a successful user, a professional product reviewer, creates neutral, rigorous reviews, without emotional charge, in a similar fashion in which a journalist would write news and articles. However, in both datasets, we find that a large fraction of the reviews are positively charged, i.e., the user presents the product or service in a favorable manner by using positively emotional words. Reviews with negative emotions are less frequent than positive ones, but they are significantly present. These ratios are presented in Table 2.

TABLE 2

Table 2. The fraction of positively, negatively charged, and neutral reviews.

2.3. Network Analysis

We quantify the social influence of users of Dooyoo and Epinions by analyzing their respective social networks. First, we measured a set of descriptive statistics on each network, measuring diameter, reciprocity, path length, and finding the largest weakly and strongly connected components. These metrics are included in Table 3, showing that a significant difference between the two datasets is the size of their largest strongly and weakly connected components. Beyond that difference, the rest of statistics show relative similarity, displaying typical properties of social networks such as low average path length and diameter. The reciprocity for both networks is relatively low, in line with previous findings on Twitter [48].

TABLE 3

Table 3. Network statistics of the analyzed datasets, Dooyoo and Epinions.

We measure the level of social influence of a user through the k-shell decomposition of the social network [14, 15, 18, 49]. We measure the influence of a node by its coreness centrality k_s, which is the state of the art metric to measure influence in social networks, as it is the best known predictor for the size of cascades [16].

In general, the k-shell decomposition of a graph is obtained by recursively removing all its vertices with degree less than k, until all the remaining vertices have minimum degree k + 1. The removed vertices are labeled with a shell number (k_s) equal to k. For our study, we choose to collapse links into undirected ones, using as degree the sum of unidirectional and bidirectional links of a user. The reason for this stems from previous studies on Twitter, which show that the undirected k-shell decomposition of follower networks can predict empirical cascades of tweets in various phenomena [17, 50].

With the k-shell decomposition we are able to obtain a ranking of nodes which is related to a hierarchical organization in terms of importance, as illustrated in Figure 1. The larger the k_s of a node, the more influential it is. We should note that the coreness centrality is, in general, highly correlated with the degree centrality. However, there is no one to one relation, since as shown in Figure 1, a node can have large degree and still be located at an external shell. Figure 2 shows the networks visualized with LaNet-vi [51], in which nodes have a color and position corresponding to their coreness.

FIGURE 1

Figure 1. Example of a k-shell structure. Nodes in the same k-shell have the same coreness centrality k_s. A high degree is not a sufficient condition for a high coreness, for example for the case of the yellow node.

FIGURE 2

Figure 2. k-shell structure of the trust networks, where nodes have a color and position according to their coreness, and size according to their degree. The plots were created using the LaNet-vi software [51].

3. Results

3.1. Network Position and Social Influence

3.1.1. Heterogeneity of Coreness

For the EP network we find 126 shells, while for the DY network we find 84 shells. The distribution of coreness values k_s of both networks, shown in Figure 3, is skewed and reveals that the location of users in the k-shells follows similar patterns. The majority of users are located in the periphery of the network, and only a small fraction of them is paced in the more central k-shells. However, though, despite that the EP network is almost twice as large as the DY network (see Table 1; the LCC of the EP is more than three times the LCC of the DY) the number of users in the more central k-shells is similar in both networks. This means that the number of very central users is not directly proportional to the total amount of users in a network, thus, there should be other factors determining users' centrality.

FIGURE 3

Figure 3. Frequency of coreness values for DY (triangles) and EP (squares). Inset: Probability density function of coreness values.

The heterogeneity of the distributions of k_s values becomes evident when fitting power-law distributions to the empirical data. Applying a maximum likelihood criterion that minimizes the Kolmogorov-Smirnov distance between empirical and theoretical distributions [52], we find that both distributions can be explained by truncated power laws of exponent α_EP = 1.39 ± 0.004 for EP and α_DY = 1.207 ± 0.005 for DY. This result is robust, since log-likelihood ratio tests vs. log-normal and exponential alternatives give positive and significant values, i.e., the power-law distribution explains the distribution of k_s significantly better than its non-scaling alternatives.

3.1.2. Social Influence Simulation

One of the goals of social networks is to facilitate information exchange between its users, i.e., information from user A can reach user B through the network link connecting them. Subsequently, the same piece of information can be forwarded by user B to user C through their respective link, and so on. This is an example of a classical spreading process taking place in a network topology [41]. In product review communities an underlying explicit social network facilitates information exchange about products (i.e., reviews). For example, when a review is created, the peers of the author will get access to new information and they have the option to either read it (and become informed) or not. Therefore, a natural way to simulate information propagation in such systems is by means of a Susceptible-Infectious (or better suited to our case Susceptible-Informed, SI) model. Such models have been used widely in the literature to describe processes like the spreading of epidemics, rumors, economic crises, etc. [53–58].

We perform large scale computer simulations of spreading processes, assuming that users stay informed after reading a review, i.e., users do not return to the susceptible state. This SI process is modeled as follows: starting from the explicit social network (DY or EP) we choose a user at random and we assume it will try (through the creation of a review) to spread information to all users it is connected to. The probability that a targeted user becomes informed by reading the review is β, and remains constant throughout the simulation. Next, the informed users will try to pass this information to all their neighbors, and so on. This process is terminated after all informed users have tried to propagate information through their respective connections. For both networks, we perform 10 runs initiating the spreading process from a specific user, and we repeated this sequentially for every user in the network using probability of infection β∈[0.1, 0.6] with step Δβ = 0.1.

In Figure 4, we plot the average fraction f of users that become informed from reviews created by users belonging to a k-shell vs. the k-shell number (k_s). In agreement with [16], we find that information initiated by the more central users in terms of k_s can reach a larger percentage of users in both networks. Therefore, the incentive of increasing ones impact in the network is correlated with the network centrality. As a result if users want to increase the impact of the transmitted information, they should try to become more central.

FIGURE 4

Figure 4. Average fraction f of informed population by reviews created from users of different k-shells vs. the k-shell number (k_s). The different curves show results for different probabilities β∈[0.1, 0.6], with the lower curves corresponding to smaller β's. Inset: Average fraction f of informed population by reviews created from users of the lowest (circle) and the highest (diamond) k-shell vs. probability of information transmission β. Left panel: DY. Right panel: EP.

In the left panel of Figure 5 we plot the average fraction, f_c, of the network that becomes informed by a review created from users belonging to the Largest Connected Component (LCC) of the network vs. the probability of transmission β. Besides the expected trend that f_c increases with the probability β, in the left panel of Figure 5 it is shown that in the DY network f_c can receive much higher values for the same β than in the EP network. This result suggests that the DY network allows a more efficient information transmission in comparison to the EP network, if we only consider the Largest Connected Component (LCC). But, if we consider the full network, then the situation is inversed. This can be attributed to the different connectivity pattern observed in the two communities (as discussed in Table 1), where for EP the largest connected component is almost 90% of the nodes, while for DY this percentage is almost 40%.

FIGURE 5

Figure 5. (A) Average fraction f_c of informed population by reviews created from users of the LCC of the network vs. the probability of transmission β. The error bars stand for the standard deviation. (B) Average fraction f of informed population by reviews created from users of different k-shells vs. the k-shell number (k_s) for the case of DY with β∈[0.1, 0.3]. The solid line is according to the assumption that information propagates contrary to the directionality of the link and the dashed line is according to the assumption that information propagates following the directionality of the link (dashed line).

We calculate topological features of users measured through the k-shell decomposition neglecting any possible effect of directionality in the links that connect them. However, the evolution of a dynamical process on a network could be heavily affected by the presence of directed links. Thus, in order to test whether link directionality affects our conclusions we apply the SI model to the DY network assuming two distinct hypotheses, (a) that information flows according to the direction of the links, and (b) that information flows inversely to the direction of the links. The right panel of Figure 5 shows the fraction f vs. k_s for both hypotheses described above i.e., information flows following the link directionality, and information flows in the opposite direction. In general, we find that for k_s > 5 the link directionality does not influence heavily the process of spreading, thus, the results we discussed in the previous analysis are valid for both cases. In what follows we try to identify the profile of the more central users, in order to understand whether there are common patterns in their behavior. After all, it is natural to assume that they did not end up being central purely by “luck.”

3.2. User Production

3.2.1. Helpfulness

Users give feedback on the quality of other users' reviews by voting individual reviews as helpful or unhelpful. In both communities, each review has a helpfulness rating calculated as a combination of these votes. The helpful rating h_r is displayed along with a review r in a qualitative scale of four grades: “very useful,” “useful,” “somewhat useful,” and “not useful.” We map these ratings on a scale from 0 (not useful) to 4 (very useful), in order to quantify the impact of a review in the community. Table 4 contains the ratios of each type of feedback in EP and DY.

TABLE 4

Table 4. Ratios of community feedback values for the reviews of each dataset.

Given this measure of helpfulness of a review, for each user u we can calculate a value of total helpfulness

\begin{matrix} h_{u} = \sum_{r \in R_{u}} h_{r} & (2) \end{matrix}

which is a sum of all the helpfulness scores attributed by the community to the reviews created by the user, R_u. Figure 6 shows the distribution of the values of h_u in each community. This figure reveals the large heterogeneity in the helpfulness of users, where most users have very few helpful reviews, while some others accumulate large amounts of positive feedback from the rest. The two communities differ in the shape of this heterogeneity, as in DY there are significantly larger amounts of users with high helpfulness in comparison with EP.

FIGURE 6

Figure 6. The distribution of the total helpfulness (h_u) of users for DY (triangles) and EP (squares).

While the distribution of h_u in EP is very irregular, it seems to follow a stylized broad distribution in DY. While the tail is not long enough to verify a power-law distribution [59], we tested the possibility of a log-normal distribution. A maximum likelihood estimation, discussed in the Supplementary Information, gives a set of parameters that fail to fit the tail of the distribution, leading us to reject the log-normal hypothesis. This initial observation indicates the existence of a process of helpfulness accumulation that creates larger heterogeneity than the one present in a log-normal distribution, but we do not have enough data to precisely explore its properties at larger scales.

3.2.2. Ratings and Emotions

Product reviews contain factual information about properties of the product and its experienced quality from the reviewer's point of view. In the two communities we study, as discussed above, a product review contains two elements: a star rating, which summarizes product experience in a form of opinion, and a review text with detailed information written by the user. The straightforward manner to analyze these reviews is to take into account the star rating, as a measure of consumer satisfaction with the product. This approach has been proved useful in the field of recommender systems [2, 25, 60, 61]. On the other hand, self-selection biases make it difficult to analyze star-rating distributions, as their high bias reduces the heterogeneity of user evaluations, following a J-shaped distribution [26]. This is the case for both EP and DY, where the distribution of star-ratings of the reviews follows a J-shaped distribution, as shown in Figure 7. Most of the reviews have star ratings ≥ 4, with a small increase on the amount of 1-star reviews in comparison with 2-star reviews. In addition, user average ratings suffer from this bias, as shown in Figure 8. To overcome this limitation, we study the emotions expressed in the text of the review, as explained below.

FIGURE 7

Figure 7. Distribution of ratings in the reviews of EP (dark) and DY (light). Both distributions show a strong bias toward positive ratings, with a moderated J-shape.

FIGURE 8

Figure 8. Scatter plot of user average ratings r_u vs. user emotional ratios for negative (N_u, left), neutral (U_u, right), and positive (P_u, right) reviews. The histograms show the distributions of each variable.

Figure 8 shows the scatter plots of the user ratios of emotional expression vs. the average rating of users, with the corresponding distributions in each axis. We can clearly observe how the average rating of users, r_u is skewed with a mean around 4, while the ratios N_u, U_u, and P_u have different distributions between 0 and 1. The pairwise Pearson correlation coefficients of r_u with each of the other three variables has absolute values below 0.25, indicating that there is significant variance of the emotional expression of users that is not captured by the ratings. The three metrics N_u, U_u, and P_u provide us with additional data beyond the simple average rating provided by a user, profiling the different types of users by the way they express their emotions in the reviews they create.

3.3. The Profile of Influential Users

We test whether there are user specific features associated with an increased coreness of the user k_u and thus with an increased user social influence. For our analysis, we use a linear regression technique on a logarithmic transformation of k_u, using the behavior metrics explained above as independent variables. This technique of substitution models has been used before to study the relation between Facebook user popularity and personality metrics from a survey [62]. In our case, we fit the following model:

\begin{matrix} \begin{array}{l} \log (k_{u} + 1) = α + β_{P} P_{u} + β_{N} N_{u} + β_{R} r_{u} + β_{T} \log (t_{u}) \\ + β_{H} \log (h_{u} + 1) + β_{W} \log (w_{u}) \end{array} & (3) \end{matrix}

The dependent variable is a transformation of the coreness in two ways: (i) calculating the logarithm to provide a monotonic transformation that decreases the variance of k_u, as its distribution is right skewed (see Figure 2), and (ii) an increment of 1 to include in our analysis active but disconnected nodes with k_u = 0. The independent variables of our model capture the different metrics of user behavior explained above. The first two variables, P_u and N_u account for the emotional expression of the user. We omit the ratio of neutral messages U_u, as its redundancy with the previous two would lead to a singularity due to the identity P_u + N_u + U_u = 1. The third variable, the average rating of the user r_u accounts for the style of the user in capturing its opinions into a precise number. The fourth variable is the lifetime of the user in the community t_u, as explained in Section 3.2.1. This variable accounts for heterogeneity in the age of users, and it might play a relevant role in the impact a user can have in the product reviews community. The fifth variable is a transformation of the total helpfulness of the user h_u, following the same principle as for the dependent variable. Finally, the last variable accounts for the logarithm of the average amount of words in the reviews of the user log(w_u), as a proxy for the amount of unfiltered information in a typical review of the user, which could have an effect on its relevance in the community (for more details on the amount of words of reviews, see SI).

We fit Equation 3, first normalizing each variable and then solving the linear regression by the method of least squares, obtaining results summarized in Table 5. Our first observation is that the linear regression is different for the two datasets. The R² for the case of DY is 0.6174, while for EP is 0.1751. This indicates that the data we obtained for Dooyoo allows us to better estimate the social influence of a user by its activity, in comparison with the EP dataset. Second, in both cases the largest significant coefficient is the total helpfulness of the user. This shows that the total helpfulness and the k-shell number of a user are directly related. In other words, a user becomes central, and therefore, more important in the community, if it contributes with many helpful reviews.

TABLE 5

Table 5. Linear regression coefficients and p-values for log(k_u + 1) from the rest of the user metrics (normalized), for Dooyoo (DY) and Epinions (EP).

The second largest weight for the users in DY corresponds to the lifetime of a user in the community t_u, with significant positive value. This means that users that have been longer in the product reviews community also have higher coreness. For EP, the average length of the reviews created by a user is the second most important factor for centrality. As in DY with lifetime, w_u is less relevant than the total helpfulness implying that the community is not concerned about the size of reviews but rather about their overall quality.

Focusing on the relation between the coreness of a user and its total helpfulness, we computed Pearson's correlation coefficients between log(h_u + 1) and log(k_u + 1), giving a value of 0.677 ± 0.006 for DY, and 0.337 ± 0.01 for EP, both with p < 0.001. This way, we conclude that the total helpfulness of a user is a good predictor for its network centrality, as both variables are significantly correlated in both datasets. Figure 9 shows the mean coreness values for users of different helpfulness levels. Both communities display a clear relation between both variables: users with higher amounts of helpful reviews also have more social influence.

FIGURE 9

Figure 9. Dependency of the k_s value given the logarithm of the helpfulness of its users (DY in red, EP in black). Points are mean values of k_s and error bars are standard error. Helpfulness serves as a predictor for coreness in both communities.

Testing the role of emotionality ratios and average rating in the results of Table 5, we notice that all three variables have very low regression weights. P_u and N_u have low significance in DY, and N_u is not significant in EP. This indicates that the role of emotions in social influence cannot be observed through this analysis at the individual level, and that helpfulness and age are more predictive variables.

3.4. The Emotional Core of Dooyoo

Motivated by the theory of collective emotions [10], we tackle the question of how do the aggregated emotions of users in different k-shells differ. For a given coreness number k_s, we aggregate the activity of all the users in that shell by the average values 〈P〉_s, 〈U〉_s, 〈N〉_s, calculated over all the users with coreness k_s. The emotional profile of the users in different k-shells can be observed in Figure 10, where each k-shell is represented by a semicircle with distance to the center according to their coreness number. Each shell has three colors that range from the minimum to the maximum values of each 〈N〉_s, 〈U〉_s, 〈P〉_s. For both communities, k-shells closer to the core have stronger negativity and weaker neutrality. It is important to notice that, even though these emotions increase within their individual ranges, the maximum values of 〈N〉_s in DY still remain lower than the other two average ratios.

FIGURE 10

Figure 10. Representation of the average emotional expression of the nodes of each k-shell, for EP left and for DY right. Each circle represents the nodes with a particular k-shell number, with a distance from the center inversely proportional to their coreness. Circles are colored in three intervals according to 〈N〉_s, 〈U〉_s, and 〈P〉_s, ranging from minima to maxima as indicated by the color bars.

A close inspection of Figure 10 shows a pattern in DY that does not appear in EP: There is an inner core composed of some shells with high coreness number that have stronger average emotion indicators, as compared with the rest of shells with lower k_s numbers. This inner vs. outer part difference is described by a critical value of k_c, which highlights a stronger emotional expression for k-shells with k_s at least k_c (the core), in comparison with the weaker emotional expression of those with k_s < k_c (the periphery).

We test the existence of this core by a set of Wilcoxon tests dividing each community in users with k-shell number above and below different values of k_c. Figure 11 shows the Wilcoxon distances Δ of 〈N〉_s, 〈U〉_s, and 〈P〉_s between the core and periphery, for values of the division k_c from 1 to the maximum coreness number. For EP we did not find any significant nonzero distances separating the neutral and negative average scores of the inner and outer parts. For DY, on the other hand, the scenario is different. There is a value k_c = 68, where there is a sharp transition that indicates a maximal distinction between core and periphery, highlighting the existence of a more emotional central subcommunity.

FIGURE 11

Figure 11. Wilcoxon distances for averaged user emotions between divisions in core and periphery, depending on the division value k_c, for EP on the left and DY on the center. Right: Wilcoxon distances between inner and outer parts of DY.

The significant separation of DY in core and periphery leads to a central core with stronger emotional expression. The right panel of Figure 11 shows the Wilcoxon distance between emotion ratios, comparing core and periphery divided by k_c = 68. The core has significantly higher negative and positive ratios, with decreased neutrality ratio. This result is supported by the dependence of the p-value of the Wilcoxon test and the ratios of emotional expression vs. k_c, as shown in the SI.

4. Discussion

Our analysis of two online product reviews communities shows the relation between community feedback, emotions, and social influence within the trust network. We measure social influence by means of the coreness of individual users, and validated such metric based on the SI process of information spread. Our findings show that, in line with previous research [16], the expected size of a cascade increases with the coreness centrality of the node it starts from. Furthermore, we analyze the heterogeneity of coreness through model fitting to the empirical distributions, finding that the coreness in both communities follows a power-law distribution. The exponents we found for these fits suggest that the mean and variance of coreness scales with system size, i.e., larger online communities serve as training grounds for even more influential users. Testing this type of scaling requires the analysis of several online communities, and remains open for future research.

We measure emotional expression in reviews through the ANEW lexicon, and aggregate the emotions of individual users in three scores for positivity, negativity, and neutrality. These three dimensions create a richer representation of individuals beyond average ratings, as emotional expression contains information not encoded in the star-ratings of reviews. Combining these features with the lifetime in the community, the average review size in words, and the levels of helpfulness votes of the users, we find that total helpfulness and average review length are the most relevant indicators for individual social influence, beyond emotional expression. Our observational analysis of one snapshot of the system point at the relevance of emotions in social influence, but further research should test other individual and temporal aspects of this explanation. Experimental studies can isolate the individual components that drive the decisions and expressions of users. Data with temporal resolution in network formation should further explore the career path of influential users, measuring the changes in k-core values as a function of contributions and emotions.

Our statistical analysis shows the existence of a sharp transition in coreness that divides the Dooyoo community in two levels: An emotional core and a more neutral surface. This structure was absent in Epinions, opening the question what process could create such difference in the relation between topology and emotional expression. An initial conjecture would point to the different reward schemes of the two communities: Dooyoo offered monetary rewards to its most successful users, who created the emotional core of influential users. While our results at the individual level are inconclusive with respect to emotional expression, this characterization of emotions in a core-periphery structure suggests that the expression of emotions provides a medium for the communication of subjective experience. Such kind of communication process would enhance the interaction of certain types of users, improving their social influence as a whole rather than if they just wrote reviews with purely factual information. Understanding how such a pattern emerges from individual emotional interaction is a question open for future research, which could potentially link individual and collective patterns of emotions and social influence.

Author Contributions

DT gathered and processed data, DT and AG analyzed the networks, DT and DG performed statistical analyses, DT, DG, AG, and FS wrote the article.

Funding

This research has received funding from the European Community's Seventh Framework Programme FP7-ICT-2008-3 under grant agreement no 231323 (CYBEREMOTIONS).

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

The authors would like to thank Epinions.com and Dooyoo.co.uk for their accessibility to public reviews and trust data.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/article/10.3389/fphy.2015.00087

Footnotes

1. ^http://www.dooyoo.co.uk.

2. ^http://www.epinions.com.

3. ^“About”-page of www.dooyoo.co.uk.

4. ^Description of monetary rewards in Dooyoo: http://www.dooyoo.co.uk/community/_page/advice_participate.

References

1. Facebook. Facebook Reports Fourth Quarter and Full Year 2013 Results (2014). Available online at: http://investor.fb.com/releasedetail.cfm?ReleaseID=821954

2. Victor P, Cornelis C, De Cock M. Trust Networks for Recommender Systems. Vol. 4. Springer Science & Business Media (2011). Available online at: http://www.springer.com/us/book/9789491216077

3. Leskovec J, Adamic LA, Huberman BA. The dynamics of viral marketing. ACM Trans Web (TWEB) (2007) 1:5. doi: 10.1145/1232722.1232727

ORIGINAL RESEARCH article

Emotions and Activity Profiles of Influential Users in Product Reviews Communities

1. Introduction

2. Materials and Methods

2.1. Product Reviews Communities Data

2.2. User Sentiment Analysis

2.3. Network Analysis

3. Results

3.1. Network Position and Social Influence

3.1.1. Heterogeneity of Coreness

3.1.2. Social Influence Simulation

3.2. User Production

3.2.1. Helpfulness

3.2.2. Ratings and Emotions

3.3. The Profile of Influential Users

3.4. The Emotional Core of Dooyoo

4. Discussion

Author Contributions

Funding

Conflict of Interest Statement

Acknowledgments

Supplementary Material

Footnotes

References

This article is part of the Research Topic

People also looked at