App Store Optimization (ASO): Selecting the Right Keywords (Part 3)

Use App Store Optimization to get your game noticed in 2015.  This guest-post by Szilard Szasz-Toth, co-founder of Mad Labs, provides a detailed walkthrough of getting your mobile game optimized in the App Store.

In the previous two posts of this App Store Optimization (ASO) Series, we have described how to identify relevant keywords for your app and introduced an easy to follow keyword selection guide to optimize App Store search rankings. As a reminder, our iterative keyword selection process looks like this:

1.   Select relevant keywords

2.   Get keyword characteristics (difficulty and traffic)

3.   Select by difficulty

4.   Optimize traffic

Following these simple steps should already help you compose a solid keyword list, so you won’t have to guess your keywords any more. In this final part of our ASO series we introduce our own multi-factor scoring model, which should take your keyword selection skills to the next level.

1. Introducing a Multi-Factor Scoring Model

The iterative keyword selection approach is very simple, quick to apply and should already increase your keylist quality. A major drawback of this approach, however, is its iterative nature.

  • In each iteration step you will eliminate potential keywords and narrow down the list of potential candidates. If you end up with too few keywords, you will have to re-adjust your thresholds and redo the entire selection (e.g. first delete all keys with a difficulty above value X, then delete all keys with a key length above Y, and then delete all keys with a traffic below Z).
  • Order matters. You will get different results depending on which selection steps you do first. Whenever you extend the iterative process with further “filtering” steps (like keyword length), you will have to decide carefully at which stage you apply that additional filter step.

Let’s assume that you can choose any of the following keywords for your keyword list (max length 35 chars):





All five keywords have pretty similar difficulties, but two have much higher search volumes. With our iterative approach we would select the three easiest keywords (“multiplication, addition, calculation”, char length = 35). In our final iteration step we might decide to switch “calculation” with “practice”, because of the similar difficulties and the much higher traffic of “practice” (in addition “practice” can be combined with “multiplication” and “addition”). The resulting keylist “multiplication,addition,practice” is only 29 chars long, so you could even add in “math” to fill up the list.

This would be a discretionary decision, though, in contrast to a rule-based decision. Would you be able to replicate this decision in the future? Would other keyword combinations be even more successful? Each discretionary, not rule-based decision makes it difficult to gauge the long term success of your keyword selection algorithm, because such decisions are subjective and depend heavily on your current state of mind. Wouldn’t it be much better to rely on a robust model?

Now let’s assume that you would also want to incorporate the keyword length as an evaluation factor. Will you delete very long keywords from your pool right at the beginning? With this approach you are running the risk of deleting keywords that have high search volume/ low difficulty without even considering them. At the same time you may be able to replace one very long keyword with two shorter keywords (like in the example above, where we replaced “multiplication” with “practice” and “math”). There are many different possible keyword combinations, evaluating all of these different possibilities with the iterative approach is not feasible, though.

You are always optimizing your keylist under certain restrictions and each change will lead to a different overall keylist quality. Using the iterative approach will not allow you to efficiently compare and evaluate the quality of different keyword combinations, because each selection takes time and requires manual work. In addition, the iterative approach is not very robust, because the slightest change in the order of the selection steps may lead to completely different results. That’s why we decided to use a multi-factor scoring model.

The benefits of our model:

  • Each keyword list has an aggregated quality score, so all keylists are directly comparable
  • The model is easily extendable to incorporate new evaluation factors (hence, highly customizable)
  • The model is robust and delivers the same results independent of the evaluation order
  • The selection process is rule-based and can be automated to reduce human interaction (avoid discretionary decisions)
  • The model is easily adjustable to account for changing circumstances during the evolution of your app (i.e. it is easy to change the influence of different factors on the keylist score)

Using an iterative approach as described earlier may not lead to the best possible keyword list, because in the first iteration step (i.e. sorting by difficulty), you may already eliminate keywords that would potentially lead to a higher overall score in combination with other keywords (keywords with very similar difficulty values, but varying length/traffic/etc). It is therefore much better to use a multi-factor scoring model.

2.  Model Framework

In our multi-factor scoring model we will need to calculate a quality score for each keyword. The quality score will depend on a variety of different factors that are relevant for your search rankings (e.g. difficulty and traffic). Each of the selected factors will have a factor weight, which determines the impact of each factor on the quality score of a keyword. The overall quality score of a keyword list is then calculated as the sum of the individual keyword quality scores.

2.1  Keyword Quality Score

The quality score of a keyword latex will be calculated as the weighted sum over all factor scores according to the following formula:

latex (11) latex (3) , where latex (1) is the weight of factor latex (2), latex (4)with latex (5) factors, for keyword latex (6) in the keyword pool.


2.2 Model Factors

In general, you can include as many factors in your model as you prefer. Some factors may make more sense than others. Here is a list of different factors that will influence the quality score of your keyword:

  • Keyword relevance
  • Keyword difficulty
  • Keyword traffic
  • Length of keyword (measured in chars)
  • Total number of apps using a keyword (which is likely already used to calculate keyword difficulty, depending on your ASO tool)
  • Word frequency within a sample (how often does a word occur in a set of sample texts, e.g. Word Frequency Data)

For each keyword you will also have to determine individual factor scores. We would suggest to normalize the score of each factor to be in a range between 0 and 10 (with 0=worst and 10=best) to make all values comparable, e.g.


Another advantage of normalizing values between 0 and 10 (worst to best) is that your keylist scores (which are the weighted sums over the individual keyword scores) will be easily comparable and you can choose the keylist with the highest overall score.

2.3  Factor Weights

In addition to determining the relevant factors of your model, you will have to assign weights to each factor. The higher the weight of a single factor, the larger the influence of that factor on the quality score of a keyword.

As discussed in our previous posts, the difficulty of a keyword is one of the most important selection criteria, especially if you have no App Store exposure, yet. Assuming your app has low exposure, no download momentum and few reviews, the difficulty of a key should have by far the largest impact on the keyword score (i.e. a factor weight above 50%). Traffic, on the other hand, is secondary to difficulty and should have a smaller impact (i.e. a factor weight below 50%).

This general rule, however, holds only true until your app becomes more successful. With the evolution and hopefully increasing exposure of your app, the factor weights need to change accordingly. If you have released the next Flappy Bird and rank in the top charts, it would make no sense to target low difficulty keywords, because these keys generally tend to have lower traffic. Instead, you should focus on high traffic words. As pointed out earlier, each app iteration gives you the opportunity to adjust your model. Always keep in mind that you are looking at a dynamic model, not a static one.

2.4  Keylist Quality Score

The overall quality score of a keyword list latex (36) is calculated as the weighted sum over all keyword quality scores

latex (39) latex (8) ,

where  latex (9)  is the quality score of keyword latex (10)  as calculated in latex (11),  latex (12), with latex (13) keywords inside of keyword list latex (14),  for keyword list latex (14)  in the set of potential keyword lists latex (16).

This formula allows you to calculate the aggregated quality score of any combination of keywords (i.e. a keyword list). If you come up with ten different keyword lists, you can now easily calculate and compare their aggregated quality scores. The higher the aggregated quality score of a keyword list, the higher the probability that this keyword list will lead to better overall search rankings.

If you choose your model factors wisely and correctly calibrate your model, you now have a very powerful framework to quickly calculate, compare and select keyword lists. The model is easily extendable by introducing new factors with their corresponding factor weights and free of discretionary decisions. If you are as lazy as we are, you can even write a script to automate the selection process of your keywords. This will save you a lot of time that you can spend on improving your app!


3.  A Simple Two-Factor Model Example

In a simple two-factor model with the factors difficulty and traffic the keyword quality score simplifies to

latex (40) latex (17) ,

where latex (18) and latex (19) are the factor weights, and latex (41) and latex (20) are the keyword specific difficulty and traffic values.

The aggregated quality score of a keyword list is then calculated as

latex (42) latex (21) .

Let’s assume that our keyword pool only consists of four keys (add, subtract, multiply, divide), our keyword list must be smaller than 15 chars, and we will assign 70% weight to the difficulty factor (30% to traffic). Our keys have the following characteristics:


We can then calculate the aggregated quality score of each potential keyword list (which is a permutation of all keywords with less than 15 chars, not considering single-word keyword lists):

latex (43) latex (44) .


The quality score for the first keylist, as an example, is calculated as:

latex (22)

latex (23)

latex (24)

In this simple example we would choose the keylist “add,subtract”, because it satisfies the keylength restriction and has the highest overall quality score.


4.  How We Select Our Keys

If you are curious of how we come up with our final keyword list, here is a rough outline of our process:

  1. We create a pool of relevant keywords (as described in part 1 of this series)
  2. We download keyword characteristics for all of your model factors (e.g. difficulty and traffic values)
  3. We normalize all values to be in a range between 0 and 10 (from worst to best)
  4. We delete all keywords with zero traffic
  5. We run a difficulty screen on our keyword pool. As described in part 2 of this series, we drop out all keywords with too high difficulty values (i.e. only take keywords in the first two difficulty brackets “target” and “intermediate”, depending on the size of our pool)
  6. We calculate individual keyword quality scores for all remaining keywords in the pool according to formula latex (25)
  7. We create a pool of keyword lists latex (26). The pool of keyword lists consists of all keyword permutations (comma separated) that have a char length below 100 chars (and are not too short, because we want to use up as much space in the keylist as possible)
  8. We calculate the aggregated keylist quality score for each keyword list in our pool latex (27) according to formula latex (28)
  9. We select the keyword list with the highest overall quality score

Our model takes into account a variety of different model factors such as difficulty, traffic or keyword length and we determine the factor weights according to past search ranking performance of our keys.

4.1 Model-Extension: Keyword Combinations

As we have pointed out in a previous post, keyword combinations count as well. Some keywords may be combined with other keywords to form new combined search strings. These combined search strings may also have desirable characteristics. Some people, for example, may not only search for “math”, but more specifically for “math practice”. Because the keyword “practice” can be combined with many other keys (such as “division practice”, “addition practice”, etc.), it makes sense to account for the added value of keyword combinations.

If our keyword list consists of the keywords “math, practice, addition”, we would analyze the added value of the following word permutations: “math practice”, “math addition”, “practice math”, “practice addition”, “addition math” and “addition practice”. For each keyword list we will compute all dual word permutations. For these word permutations we will download keyword characteristics such as difficulty and traffic. It is important to point out that the word order matters, e.g. the search strings “math practice” and “practice math” have different characteristics.

If a word permutation makes absolutely no sense (like “teacher addition”), its traffic value should equal zero. Because nonsense permutations will generate no added-value for our keyword list, we will delete all combinations with zero search traffic. Next, we calculate the quality score of each permutation analogously to formula latex (29) ( latex (30)). In a final step, we compute the overall added value of all permutations for a keyword list latex (31) as

latex (45) latex (32) ,

with latex (33) number of dual word permutations in keyword list latex (31), where latex (35) is any dual keyword permutation out of the current keyword list latex (36). The extended keylist score is then equal to the sum

latex (46) latex (37) .


Wrapping it up

In this final part of our ASO series we have introduced our own dynamic multi-factor scoring model and outlined how we select our keywords. The model has multiple advantages over a simple iterative selection process and should help you find the ultimate keyword list to boost your App Store search rankings and download numbers.

Even though the model might look complicated at first, once implemented, it is easy to maintain and extend. Our model framework allows you to objectively measure the success of any changes you make to your model, because it is free of discretionary and subjective decisions (as long you follow the framework, of course ;).

As with any ASO technique, it is extremely important to adjust your model over the evolution of your app and to react to changes in the App Store environment. Always remember that ASO is a continuous process and takes time. With our framework, however, we have laid the groundwork to minimize the time you spent selecting your keywords. So in the future, you can hopefully spend less time on ASO and more time on improving your app.

If you have any questions or would like to discuss our approach, please feel free to contact us at or leave us a comment below!

Good luck with your ASO endeavours!

The author of this article is Szilard Szasz-Toth, co-founder of Mad Labs – a German game studio, and also lead developer of the math game “Blackboard Madness: Math“.

The original blog post appeared on our dev blog:

Leave a Reply

Your email address will not be published. Required fields are marked *

Video game data and analysis from a consultant