Plumbee engineering

Event Logging for Analytics

| Comments

Introduction

Sometimes it’s easy to overlook the wealth of useful information generated as users interact with an application. At Plumbee, we have a highly scalable infrastructure, capable of logging many events. We use this abundance of data to detect patterns and trends which emerge as we apply analytic tools.

Using analytics on event data, we are able to constantly refine and improve our games. Here are some examples of the approaches we have taken:

  • We use data about which games generate the most spins from users to optimize the order of games in our lobby.
  • Our analytics team looked at the order of client events generated by button presses in the game lobby screen. This allowed us to determine the likelihood of a user coming back to a game based on what they did previously in that game. For example, we found better user retention when they clicked the “Collect coins” button and used the free spins feature. This analysis allowed us to tailor a tutorial to guide them along this path.
  • We employed A/B testing to determine if our starting coin balance for new users was high enough to ensure a great experience when they first play. To make this analysis, we looked at events generated by a user’s first session, including how many spins they were able to make before running out of coins.

Volume and Design

Our most popular web-based game, Mirrorball Slots, generates over 150Gb of data every day as players interact with it. While this may seem like a lot of data, today’s available cloud infrastructure makes this very manageable. We use various technologies to capture the events, log data in a standardized fashion, use ETL tools to move and transform the data from the queue, and analytic tools to do our analysis. In this article, we’ll focus on the first step in this process, logging the data.

What gets logged?

Almost everything from our platform and games gets logged . We do this for two reasons. First, storage is relatively inexpensive, and secondly we never know what data will be useful or what type of analysis may be pertinent in the future.

Server requests

Almost all of the client-to-server HTTP RPC events are logged.

All these requests include the endpoint called, HTTP request parameters, user ID, country code, the test variant and the JSON representation of the Protobuf message (if present).

Server database writes

All database blob writes are logged as an event. We convert the blobs from Protobufs to JSON and dispatch the event.

Information contained in these sort of messages includes the current stored state of the user in a slots game, the amount of coins, cash etc. that the user has, last login time, install time, number of experience points, current level etc.

Client events

Some events are generated by the client and sent to the server for logging. This includes what GPU users are using, when the user clicks on a game and starts a session or leaves.

Server event

These are specific events which the server dispatches for certain types of analysis which cannot be done from other types of events.

How do we log events?

To log events, our game servers (running in Amazon EC2 instances in CloudFormation stacks) are constantly putting event messages on an Amazon SQS queue. This scalable message queue allows us to asynchronously feed the event messages into our analytics systems, without affecting the enjoyment of users playing on our servers.

Amazon SQS stores the messages redundantly behind the scenes for up to 14 days, giving us ample time to de-queue the data and send it to our analytics system. As an example of cost, we currently pay about $0.0000005 per request which allows us to comfortably log as much data as we want.

The other advantage of SQS for us is that it is a fully-managed AWS service allowing us to focus resources on putting cool features into Mirrorball Slots!

pipeline alt

Above is a high-level diagram of the event logging pipeline. This blog post focuses on how our game servers get messages on the SQS queue.

Message format

With SQS, the base type of messages you can put on the queue are Strings. Many of the events we log are serialized with Google Protobufs since we use it extensively for persistence and RPCs events. For these events, we simply use the JSON serialization utils in Protostuff to create JSON strings and JSON-lib whenever we need to do anything in particular before enqueuing a message. Here are some examples of our event formats.

User stats database write

{"sessionId":"xxxxxxxxxx",
"testVariant":"_TStartingBalanceTest_VA",
"plumbeeUid":xxxxxxxxxx,
"shardId":0,
"message":{"userStats":{"xp":75709143,
        "level":325,
        "score":44950615,
        "lastLoginTime":1372156968089,
        "isSpender":true,
        }
    }
}

Client logging

{"sessionId":"xxxxxxxxxx",
"testVariant":"_TStartingBalanceTest_VC",
"plumbeeUid":xxxxxxxxxx,
"shardId":0,
"metadata":{"clientCall":true},
"message":{"request":{"analytics_log_type":"loading_time_log"},
    "response":{"player":"WIN 11,7,700,224 ActiveX","fbId":"xxxxxxxxxxxxx",
        "stage":"inbox",
        "loadingTime":5352,
        "os":"Windows 7"
        }
    }
}

Batching

To handle the job of encoding messages into our format, we created a higher-level logging API on top of SQS. This API handles encoding and formatting, and makes use of the batching capability provided by SQS.

This approach allows us to batch up multiple messages in a single request and reduce the service costs of SQS at the cost of some additional latency between the event occurring and being available for analytics, which is perfectly acceptable in our use case. Additionally, there is another cost, which is the higher chance of data loss if a server crashes since we are holding messages in memory a bit longer which in reality is rare but if it does happen we have to account for when analysing the data.

Logging server requests

We also created an Aspect-oriented Interceptor which allows us to easily specify endpoints which we would like to log.

Our servers use a MVC framework where we can annotate methods in Controllers as endpoints - our interceptor intercepts incoming web requests to log the endpoint called augmented with additional metadata like geolocation, request parameters, or the A/B testing variants the user is participating in simply by adding an annotation to the method.

Logging client events

The client will track clicks on various UI events and forward these to an endpoint on the game servers, which simply forwards it on to our event-logging framework. This is particularly useful when running A/B tests. It gives us data about how users interact with the interface which we can combine with other data such as user retention to best optimise the user experience.

Going Further

We hope this article has helped you see some new insights into events, logging, high-volume queue capture, and the kinds of questions you can ask once you have a great event logging system in place.

Learn more about:

- JingKei Ly

Comments