Jan. 26, 2021, 10:08 a.m.

R
*· 8 min read*

An application's average user rating is a great indication that helps determine whether or not users enjoy the app and its' level of success. If users are required to purchase an application their standards for the application will be higher compared to a free application. Given that, one might think an application that costs must mean the application offers specific features that make the paid application better than a free application, so paid applications must be better and receive higher reviews compared to free applications.

While charging for an application is an immediate and effective method to quickly generate revenue, does that necessarily mean paid applications will rate higher than free ones? This is an important question to ask yourself when deciding to make an application free or not.

If you are planning to launch an application and questioning whether or not you should charge for your app, I performed a simple linear regression model and correlation test in RStudio to determine how correlated an app's price is to average user ratings.

**What is a linear regression model? **

A linear regression model is a graph with scattered data points where a straight (or linear) line passes through the points. The linear line is a calculated equation written as `Y= mX + b`

where `Y`

is the response variable and `X`

is the explanatory variable. This equation means if two variables are associated then, on average they should increase or decrease by the slope (`m`

) of the line.

In my case, an application's price is the explanatory variable and an application's average user review is the response variable. If there is a relationship between these two variables, then an app's average review should increase by a specific amount when the app's price is higher.

**What is correlation? **

Correlation is a number ranging between -1 and 1 which determines how closely two variables are associated together. If two variables correlate to one another that means the variables change together at a constant rate. In my example, if an app's price and reviews are associated that means the more expensive an application is, the higher reviews it will receive.

For two variables to have a strong positive correlation, the correlation number should be as close to 1 as possible. If the correlation number is close to 0, it means the two variables have a weak correlation and if the correlation number is close to -1, then the two variables have a strong negative correlation.

Although, it is important to remember correlation does not mean causation. Just because two variables are correlated does not necessarily mean one variable is the direct cause of the other variable. This is an important factor to consider because while the data might report one thing, it's important to ask yourself whether the data results are reflective of consumers' actions and behaviors. This is known as external validity--when the results of your data are truly reflective of the behaviors consumers in the real world are likely to commit.

**How Strongly Do Apps That Cost Dictate Users' Ratings? **

To answer this question, I downloaded a dataset from Kaggle, which offers thousands of unique datasets for any project you are working on. I selected a dataset called appstore_games and chose two quantitative variables, (1) price and (2) average user rating. The dataset provides me with information on 17,008 different applications from The Apple Store.

**Steps:**

If you want to follow along or perform your own test with a dataset you found, simply follow these steps.

- Once RStudio is running, create a new R Markdown file. You can either delete the information automatically written on the document or start your work below the information.
- Load your dataset. Feel free to name the dataset whatever is convenient for you. I named mine
*apps*because that is the information I am dealing with. - Next, I need to sort and filter my data so I have the two variables I want. You may or may not need to do this step. When you are sorting data, you MUST write the exact name of the column in order for R to read your code. My dataset contains a lot of "NA" or not available information so I am going to remove all of them. In order to sort and filter data, you must download dplyr into RStudio. Here's the code:

```
library('dplyr')
apps <- read.csv("~path_to_your_dataset/Name_of_your_file.csv") %>%
filter(Price!="NA" & Average.User.Rating!="NA")
```

Understanding the code: First I loaded dplyr into my RStudio library and then told R to read a .csv file I downloaded and to name it *apps*. I used *filter() *and removed any information with "NA". My cleaned dataset will only contain information on apps with data.

**Findings: **

**Graph #1**

```
library('ggplot2')
ggplot(apps) + geom_point(aes(x = Price, y= Average.User.Rating), color= "red") + labs(title="Number of Apps for Each Price", x = "Price of Application (US Dollars)", y = "Number of Apps")
```

Interpreting the graph: In this chart, we can see most of the data points are on the left side of the graph meaning the majority of the apps are either free or cost a few dollars. There are also a few outliers or data points that are significantly different than the rest of the data points, as one app appears to costs roughly $180, but received 4.5 stars.

Understanding the code: First, I load *ggplot2* into my library so R knows I want to make a graph. ggplot2 allows you to make different types of graphs to represent your data perfectly. For my code, I am going to use *ggplot()* to call on *apps,* then I add a scatter plot graph with an aesthetic mapping where the x-axis equals the *Price* of the applications and the y-axis equals the *Average.User.Rating*. I want the points to be red so I set *color=* as such and finally I create titles for my graph and x and y-axises.

**Graph #2**

```
ggplot(apps, aes(x = Price, y = Average.User.Rating)) +
geom_point() +
geom_smooth(method = "lm", color= "red", se = FALSE) +
labs( title = "Regression Model of Applications' Price and the Average User Rating", x = "Applications' Price", y = "Average User Rating") + theme(plot.title = element_text(hjust = 0.5))
```

Interpreting the graph: The positive red linear regression line indicates there is a positive association between an app's price and the average user reviews. However, the data points are far from the linear regression line which indicates a correlation number close to 0.

Understanding the code: Similar to the previous graph, I begin my code by using *gglot()*, calling on my dataset *apps, *and assigning *Price* as the x-value and *Average.User. Rating* as the y-value. Next, I add *geom_point()* to create the scatterplot, *geom_smooth()* to create the linear regression line, and *labs()* to give my graph and x and y-axises titles.

**Correlation Number**

`cor(apps$Average.User.Rating, apps$Price)`

Interpreting the data: After running a correlation command, I got a number of .0459, which is very close to 0. As I mentioned earlier, correlation numbers closest to 0 imply there is a weak correlation between the two variables.

Understanding the code: RStudio has commands that do a lot of the work for you! I ran a correlation test by using *cor()*, selected by dataset, *apps*, and told R within *apps*, select *Average.User. Rating* as the Y variable. I then repeat the same process by calling on *apps* and telling R to select *Price* as the X variable. Running this line of code generates the correlation number.

**Linear Regression Model Output**

```
regression_apps <- lm(Average.User.Rating ~ Price, data=apps)
summary(regression_apps)
```

Interpreting the information: The linear regression command prints a lot of information that seems very confusing. To determine whether price has an influence on an app's rating, I only need the information boxed in red. The box capturing the information on the left tells me for every dollar an application costs, on average, the app's user rating will increase by .007065 stars (this is the slope of the linear regression equation).

The box capturing the information on the right gives me a p-value which indicates the probability of my test containing results supportive of my main question needing to be answered--which is does an app's price influence the app's average user rating? When a p-value is less than .05, the test is significant meaning there is a potential connection between the two variables being examined. When a p-value is greater than .05, the test results are not significant and there is not enough information to conclude there is a relationship between the two variables. In my situation, I have a p-value of .1068 which is greater than .05 meaning I **cannot** conclude an app's price influences an app's average user rating.

Understanding the code: The code *lm()* tells R to make a linear regression model using *Average.User.Rating* as the Y variable and *Price* as the X variable

**Conclusions: **

Given all the data and calculations I conducted, it is evident there is no correlation or relationship between the price of an application and the average user reviews received. In other words, an application's price has no effect on the reviews it receives.

**Potential Improvements in my work: **

After reviewing my experiment, I realized I could have refined my data more by only selecting the columns Price and Average User Rating so there is less information I need to handle. Additionally, if I were to conduct another linear regression model using different datasets, maybe I should compare multiple explanatory variables that might influence the response variable instead of selecting one explanatory variable. This way I have more information and insights to clarify the assumptions I want to make about a given dataset. Finally, I could even choose a dataset with more debate and discussion behind it to see if I can find variables with stronger positive associations.

**What Causes Poor App Reviews? **

Wondering what might be causing such bad reviews on certain apps, here's a list below of a few potential problem areas.

- the app has bugs and glitches
- the app crashes often
- the app does not include features users desired
- the app removed features users enjoy

**How to Improve Your App:**

Now that we know price is not a determining factor in an app receiving good reviews, what are some factors that do to help increase your app store optimization (ASO)?

Apps that continuously receive low ratings won't get featured on app stores and will continue to be ranked low so it is important to consider what the majority of users are asking for, which features they enjoy, and which features they dislike.

It is also critical to notice any trends. If you release an update for your app and reviews increase, you're heading in a positive direction but if the update released caused bad reviews, it means the app needs to be more geared towards users' interests.

One last way you could improve your app is by responding to BOTH positive and negative reviews to let users know their opinions matter and that real people are working on the app to make the user experience more enjoyable!

Follow us @ordinarycoders

0 Comments