Efficient Serialization for Fine-Grained Behavioral Data

In the ever-growing world of connectivity between devices, data transfer is critical. Network traffic is still a closely tracked commodity even with the growing availability of network connections that could only be dreamed of years ago. Therefore, making sure that everything going over the wire is handled efficiently is an important factor in any development project. This factor is even more crucial when dealing with high volumes of data like fine-grained behavioral data.

Baselining the Data

The first step to improving serialization is to determine a starting point. This is made simple by using the Moonsense Sample Payment App to capture the data. We have apps for Web, Android, and iOS. They make selecting which behavioral data to capture and setting a duration for the capture period very easy.

To get a feel for how much data is being captured, the chart below shows seven sensors being monitored and the number of data points they’re reporting each second. All in all, this amounted to about 358KB of data across one minute of recording with accelerometer data capturing in a low-frequency mode.

These data points correspond to the Moonsense sensor data model. To narrow down the scope and make it easier to focus on the data, we’ll start with a particularly chatty sensor, Pointer data. Pointer data is meant to capture mouse or touch data as a mouse or finger moves across the screen. This can result in many data points being captured within a given timeframe which the Moonsense SDK will then package into a “Bundle” for delivery. An example bundle for one second of pointer capture looks like this:

JSON
{
  "pointerData": [
    {
      "determinedAt": 6353,
      "type": 4,
      "buttons": 0,
      "delta": {
        "dx": -1
      },
      "orientation": 0,
      "position": {
        "dx": 374.24249267578125,
        "dy": 573.9091796875
      },
      "pressure": 0,
      "pressureRange": {
        "lowerBound": 0,
        "upperBound": 1
      },
      "radiusMajor": 1,
      "radiusMinor": 1,
      "size": 1,
      "viewportBoundaryStatus": 2
    },
    {
      "determinedAt": 6361.600000023842,
      "type": 4,
      "buttons": 0,
      "delta": {
        "dx": -1,
        "dy": 1
      },
      "orientation": 0,
      "position": {
        "dx": 373.8934631347656,
        "dy": 574.5199584960938
      },
      "pressure": 0,
      "pressureRange": {
        "lowerBound": 0,
        "upperBound": 1
      },
      "radiusMajor": 1,
      "radiusMinor": 1,
      "size": 1,
      "viewportBoundaryStatus": 2
    }
… 84 more data points
  ]
}

The resulting JSON when packaged and sent totals around 53KB. This isn’t horrible as far as modern bandwidth speeds are concerned but it’s important to keep in mind that this only accounts for a single sensor of the many the Moonsense SDK is capable of capturing. Sensors such as the accelerometer or gyroscope can also generate substantial amounts of data.

Steps To Improvement

The first step to improving this request could simply be to gzip the request. This requires some additional effort on the client-side to support gzip and may require extra effort on the backend to support it, but gzip is a common method of decreasing network traffic.

Taking the request from above at its 53KB benchmark then gzipping it manages to drop our request size down to about 3KB. A 94% improvement! Very impressive.

But why stop there? Can it be made even better?

Enter Protocol Buffers

Protocol Buffers are a Google creation to simplify the serialization of data across multiple platforms and make the transfer of serialized data even smaller. Protocol Buffers work by creating a message structure in the .proto format then compiling it to the languages of your choice. This allows for a common message structure to be shared between your frontend written in Javascript and backend written in something like Go, Java, or C#. The compiled messages can then be used as standard classes within the language of your choice.

At Moonsense, we compile our protocol buffers into Javascript using protobufjs for use in our Web SDK, using Apple’s swift-protobuf for our iOS SDK, using Square’s wire-compiler for our Android SDK, and using the standard protoc compiler for our Python-SDK. We even use a generator for our SDK data model documentation to ensure it always stays up to date.

It requires some extra effort to define the messages but it also ensures continuity between all our platforms. By managing the structure this way it is easy to ensure that all our SDKs are serializing data the same way.

Now, how about the request size?

When sending the same request as above that was 53KB uncompressed and 3KB compressed using protocol buffers, the request measures in at 10KB uncompressed and 2KB compressed. That means an 81% improvement for uncompressed data and a 33% improvement for compressed data.

In Summation

Overall, protocol buffers can offer very significant savings in terms of network traffic, thereby, improving round trip time on network requests and leading to cost savings on ingress and egress traffic. They also bake in a standard way of handling serialization and ensure a common format for objects across the organization which can greatly improve the time it takes to handle data between portions of your infrastructure written in different languages.

Adding in protocol buffers does come with some overhead for your project in terms of application size. Application size is especially important when dealing with web or SDKs, so you will want to make sure you are monitoring how big your application is. We go into more detail on how Moonsense does that in our article Keeping SDK Size in Check.