INTRODUCTION: We’ll begin by addressing why it is necessary to build a relationship between
QoE and QoS?
The quality of a service has always been assessed differently by the service providers and their customers. For the service providers, the quality is often in terms of specific network level QoS parameter such as throughput, jitter, loss ratio, etc. whereas for the customers, quality is perceived in subjective terms as they are not interested in knowing the technical parameters of the network. QoS is often a term used to refer to the disturbances that effect the customer’s experience. While, the customer’s satisfaction is expressed in terms of QoE. So, it is necessary for the service providers to know how to relate network-level QoS with QoE to offer efficient services. Both QoS and QoE are deemed to have multiple dimensions. QoS parameters can broadly be classified as application level(e.g. Stallings, initial delay, Resolution) and network level parameters(e.g. Packet Loss, Packet Delay). Many theoretical models have been laid down, so far, that explain the relationship between QoE and QoS. Of them, the most common models are: IQX: It proposes an exponential mapping between QoE and QoS. Weber-Fechner: proposes a logarithmic mapping between QoE and QoS. Steven’s Power Law: proposes a power mapping between QoE and QoS. So, the QoE influence factors of interest, here, are the Stallings and the Initial Delay. There are two important aspects of Stallings that are considered when studying its impacts on QoE: 1. Duration of Stalls 2. Position of Stalls. When the combined effects of I.D and Stallings were studied it was identified that even at the cost of increased I.Ds, service interruptions had to be avoided to have better QoE, however, no multi-dimensional models were proposed. Which brings us to, why it is necessary to have multi-dimensional models? It is because in a real scenario it is never one degradation after another but multiple degradations acting at once effecting the QoE. So, the multi-dimensional models will help us understand: if one of the QoS parameters is performing badly then will improving the other QoS parameters improve the overall QoE. So, in this context, the most commonly described models are the additive model, the multiplicative model and the mixed models. The additive model is a weighted summation of the effects due to all the influence factors. The multiplicative model is a weighted product of the efforts due to all the influence factors. A mixed model, as the name suggests, is a mix of the additive and the multiplicative models. We have studied the impacts of the influence factors in non-adaptive video streaming conditions instead of using the common HTTP Adaptive streaming. Because in adaptive streaming, the quality of the video is adjusted adaptively to suit the changing network conditions. So, it is not possible to study the underlying network problems. Therefore, throughout our study we have used non-adaptive video streaming set up. The purpose of this thesis is that the latest research work in this area had attempted at modelling multi-parameter QoE models from single-parameter QoE models based on theoretical assumptions and described the need for subjective test results to establish and prove their proposals. Therefore, we continued the research in this area and studied the combined impacts of Initial Delay and Stallings on the video quality. We have also studied the impacts of these two parameters in the presence of a third parameter- the Resolution. METHODOLOGY: The setup consisted of a Server, a Shaper and a Client. A VLC player was installed on the server and the client. The Shaper was configured with a network emulator to control the traffic between the Server and the Client during video streaming. UDP was chosen as the transport protocol for the streaming to avoid additional delays upon loss of packets. Three videos of different Resolutions(480p, 360p, 240p) were used for the experiment. Each video was of 30 seconds duration. As disturbances we have used 3 Initial Delay values(0s, 2s, 7s) and 3 Stall values(0, 1 ,2). The algorithm for the script used can be explained in 4 steps: 1.Playing the disturbances in a random sequence: There are in total 27 combinations of disturbances that could be introduced using the Resolution, Initial Delay and No. of Stalls. The idea was to use a 3*9 matrix with 1-27 numbers arranged in an ascending order and associating each no. with a combination of (R, I.D, S). This was done so that the 27 combinations of disturbances are played out in a different order for every user. Also, care was taken such that the videos do not play out continuously thereby, giving more time to the users to mark their responses. 2.The second step was to start VLC at the Client and to begin collecting the VLC logs. 3.The Shaper was configured such that when the no. of stalls is 1, a stall of 2s is induced after 15s of playing the video and when the no. of stalls is 2, then 1 stall of 1s duration and with 5s in between, a second stall of 3s durations is induced into the video while streaming. The idea was to maintain the average length of stalls at 2 seconds. 4.Then the final step was to establish a connection with Server and request the streaming. The Shaper was setup as a router with two interfaces- eth0 and eth1. Eth0 is towards the Server and Eth1 is towards the Client. Subjective Tests-> They were conducted on 15 users. A questionnaire containing the following questions was handed out to them after playing each video. A continuous scale of 1-5 with only the integers marked on the scale was used. Here 1-poor, 2-bad, 3-fair, 4-good, 5-excellent. The acquired data was stored into Excel and the following data analysis was performed. -------------------------------------------------------------------------------------------------------------------------- Results Analysis: Main Effects Plots: We shall now look at the mean effects of the disturbances, individually, on the QoE. First plot shows the impact of Resolution for 480p,360p,240p, the second plot shows the impact of Initial Delay for 0s,2s,7s, the third plot shows the impact of Stallings for 0,1,2 stalls on QoE. In the first plot, you can see that the mean MOS value for 360p is greater compared to that for 480p. Implying that the users liked the 360p videos better compared to the 480p videos, on an average. This could probably be because the 480p videos showed disturbances even before any additional disturbances were introduced in the network. In the second plot, you can observe that the line connecting the mean MOS at I.D=2s and I.D=7s is parallel to the x-axis implying that there is no main effect between the two groups. In other words, 2s I.D and 7s I.D showed the same level of impact on the MOS, on an average. In the third plot, however, the impact of 0, 1 and 2 stalls are significant on MOS. The steepness of the line shows how strong an impact is. The slope of the line between 0 and 1 stalls is greater when compared to the slope of the line between 1 and 2 stalls, implying a greater impact. -------------------------------------------------------------------------------------------------------------------------- Interaction Plots: In our study, it is also very necessary to understand the effect of multiple factors on QoE. In each of these plots, the impacts of the three factors were considered two- at-a-time. An interaction plot displays the levels of one variable on the X axis and has a separate line for the means of each level of the other variable. Parallel lines indicate no interaction between the two factors, whereas, the more non-parallel the lines are, the greater the interaction. Starting from top left corner, the first plot shows MOS vs I.D with Resolution as the parameter of the curves. Here again, you notice 360p(the red line) performing better than the 480p case. The second plot shows MOS vs Stallings with Resolution, again, as the parameter of the curves. In general, you can observe the curves to be dropping with increase in the no. of Stalls. The third plot shows MOS vs Stallings, again, but with I.D as the parameter of the curves. Here also, you can notice the curves to be dropping with increase in Stalls. Also, the curves seem mostly overlapping which implies that the I.D has not much of an impact on the MOS. In the fourth plot, MOS has been plotted against I.D with Stalls as the parameter of the curves and here you can notice how drastically the presence of stalls has shifted the curves. So, from plots 3 and 4, we know that the Stalls showed a greater impact while I.D did not show much of an impact. -------------------------------------------------------------------------------------------------------------------------- CDF Plots: Then the Cumulative Distribution Functions were studied to compare and understand the distribution of data from different sets. A CDF describes the probability of there being values less than or equal to a data point. It is also referred to as the Actual Frequency. The plots, here, show an accumulation of ratings around the labels despite of having used a continuous scale in the tests to record the user ratings, probably, because of the integer markings. -------------------------------------------------------------------------------------------------------------------------- ANOVA: Univariate ANOVA tests were conducted on the MOS data to see what statistical interferences can be drawn from the test results. The analysis is called the Univariate because there is only one dependent variable (MOS). There are 3 independent variables that effect MOS: the Resolution, the I.D and the no. of stalls. The table, here, shows how it has categorised the data. Under each factor there are three levels, and, in each level, there are 135 data points. Since there are 27 combinations of disturbances, with 15 data points each, with one factor fixed there are 9 possible combinations possible with the other two factors. So, 9*15=135. Before analysing the ANOVA results, there are something known as the Null Hypotheses that need to be established which are just negative statements saying that these factors have no significant impact on MOS and that the interactions between the factors taken two at a time also have no significant impact on the MOS. -------------------------------------------------------------------------------------------------------------------------- This table, here, shows the test results. In the first column, we have sum of squares for each factor, sum of squares for two factors, the sum of squares of all factors and the sum of squares error (defined as the SoS between every data point and the group mean). The second column indicates the degrees of freedom. The third column has the mean squares, that are calculated by dividing the SoS by the corresponding df. The fourth column has the F-scores that are calculated as the ratio between the corresponding SoS and SoS error. Then, the fifth column shows the significance level or the probability of occurrence of the events. These events are the null hypotheses that were first established. So, if the probability > 0.05 we fail to reject the null hypothesis and if probability < 0.05 we reject the null hypothesis. -------------------------------------------------------------------------------------------------------------------------- The results are as follows! -------------------------------------------------------------------------------------------------------------------------- Then the impact of I.D on MOS was explored using the exponential and the logarithmic functions and they were found to be overlapping implying that the I.D could be modelled either using an exponential or a logarithmic mapping. Also, since we have established that the I.D does not have a significant impact on MOS from the previous slides, it can be modelled using any function. The coefficient ‘a’ indicates by how much the curves are translated along the MOS scale. With increase in no. of stalls you can see the value drop. -------------------------------------------------------------------------------------------------------------------------- The impact of Stalls on MOS was studied using the exponential model. ‘a’ is the scaling factor, here, and you can notice how it is decreasing with the increase in I.D. Why is it multiplicative and not additive between I.D and Stallings? For e.g.: if you look at the 240p user ratings here in the tables. You can notice how an additive model has made the MOS drop for I.D=7s. So, this is how we were able to understand that the multiplicative model is a better than the additive model. In formulating the overall formula, two aspects were considered. 1.What kind of an individual impact does each factor have on QoE?(exponential, log or power) 2.How is the interaction among the three factors taken two-at-a-time?(additive or multiplicative) There are two ways of understanding and arriving at an answer to these questions: 1.By observing the data. 2. Based on theoretical propositions Establishing everything theoretical first, we have(meaning that these are the inferences that we can draw from the theory available in literature): 1.So, to begin with, the impact of stalls on MOS was established to be exponential in a study on video delivery using YouTube. 2.The impact of Stalls and Initial Delay was proposed to be multiplicative based on theoretical evidence for the following reason. When a partial derivative is applied on the overall QoE w.r.t Stallings as the influence factor, it turns out to be dependent on the current QoE level which seems reasonable given that the I.D does not have a significant impact. 3.Then, the impact of resolution was described as logarithmic in a paper on Provisioning Delivery Hysteresis which describes resolution as a controllable degradation. 4.The same paper again describes the impacts of controllable and uncontrollable QoS parameters using a hysteresis loop. Since, in our case Stallings is the uncontrollable degradation, the overall impact of Resolution and Stallings can be derived as multiplicative because introducing Stallings at any point in the QOE vs R graph will make the curve drop which is the nature of a multiplicative model. 5. We have established, in the earlier slides, that the impact of Initial Delay could be either exponential or multiplicative. 6. There is no theoretical background for establishing the impact of I.D and R on MOS. The combined mapping of I.D and R on MOS can be derived from a data-driven approach. Both additive and multiplicative models were studied.(refer to the formulae) How well a formula fits the user ratings can be calculated based on 1.Confidence Intervals and 2. Correlation. 1.Throughout the phase of deriving the formulae that were a best fit for the data, it was made sure that the curves based on these formulae fall within the C.I bounds. 2. It gives a measure of closeness between user ratings collected from the subjective tests and the MOS values obtained from the formulae. The coefficients for Stallings in f4 looks more reasonable compared to that in f6 bcoz for the same increase in no. of stalls the user ratings drop faster using the f6 formula. Because of lack of sufficient number of user results we had to settle for a formula with R2=0.2008