Augmenting Segment customer data with behavioral signals using the Moonsense SDK

Introduction

In a previous blog post titled Risk Management As A Dimension Of Your Data Analytics Strategy, we highlighted the importance of keeping fraud and risk top of mind when devising an analytics strategy for your company. In this blog post, we will demonstrate how integrating Moonsense into your existing analytics infrastructure can help augment customer profiles by providing better signals describing various fraud typologies and attack vectors that compromise the security of your product. 

While building a risk scoring model for fraud and risk is a fairly sophisticated topic that warrants its own separate blog post, we can talk through a simplified version of how this works using the example of devices that are remotely controlled.

Attacks executed via Remote Access Tools (RAT) are a common threat faced by most large financial institutions where the attacker aims to gain access to the victim’s device using software installed either by social engineering techniques or via malware on the victim’s device. Once installed, the attacker can perform any number of transactions to siphon money out of the victim’s account.

In order to detect these attacks, we will be using the Moonsense SDK to collect a wide range of sensor data for modeling user behavior. In our example, we will be configuring the SDK to collect touch interactions from the user’s device coupled with linear accelerometer readings from the sensor. This captured data can be forwarded via the analytics pipeline to perform risk scoring. To demonstrate this in an example, we will be using Segment, but note that this approach can be easily applied to other customer data platforms as well. 

Before we walk through the integration we just need to take a slight detour to talk about how a risk scoring model is built using features extracted from the captured data.

Feature Extraction

Above is a high-level diagram of how our system is built for the remotely controlled use case. The customer app integrates the Segment SDK alongside the Moonsense SDK in order to capture behavioral data. Since we need access to touch events in addition to accelerometer readings, we will integrate a very common UI pattern called a “Swipe To Buy” widget as shown below. The widget allows the consumer app to capture a block of continuous touch events (i.e. gestures) in order to determine the validity of purchase transactions.

These raw events as-is are too granular to be used directly to train a risk scoring model. This data instead needs to be transformed into what we call “features” for easier consumption. A few examples of features we can extract from the touch and linear accelerometer readings are:

Touch Events

  • Duration – this is calculated as the delta between two consecutive touch events.
  • Distance – the distance between two x and y points between two consecutive touch events.
  • Displacement – the cumulative sum of the distance values calculated above.
  • Velocity – the distance value divided by the duration.
  • Angle – the angle between two consecutive events expressed using the arctan function.

Linear Accelerometer

  • Magnitude – the length of the vector calculated from each individual sensor sample.

Below is an example of distance being calculated from a touch event. Note that Moonsense SDK returns a Bundle object that samples all sensors for a 1-second window.

Pointer Travel Distance
private fun getDistance(bundle: Bundle) {
    val distance = mutableListOf<Double>()
    for (i in 1 until bundle.pointer_data.size) {
        val x1 = bundle.pointer_data[i-1].delta.dx
        val x2 = bundle.pointer_data[i].delta.dx
        val y1 = bundle.pointer_data[i-1].delta.dx
        val y2 = bundle.pointer_data[i].delta.dx
        distance.add(sqrt((x2 - x1) * (x2 - x1) + (y2 - y1) * (y2 - y1)))
    }
}

We extract features to reduce the dimensionality of the data and remove any redundancy in the events tracked by the customer app. Once reported, the features provide a better representation of the data for machine learning and interpretation purposes. For example – the velocity of touch events for a swipe to buy varies for a real user v/s a remote one.

These captured feature arrays can further be filtered down to 5 representative values for each 1-second window of time namely – pmin(min percentile), p25(25 percentile), p50(50 percentile), p75(75 percentile), pmax(100 percentile).

Computing Percentiles
private fun getPValues(distance: List<Double>): PValues {
    val distancePMin = percentile(distance, 1.0)
    val distanceP25 = percentile(distance, 25.0)
    val distanceP50 = percentile(distance, 50.0)
    val distanceP75 = percentile(distance, 75.0)
    val distancePMax = percentile(distance, 100.0)
    return PValues(distancePMin, distanceP25, distanceP50, distanceP75, distancePMax)
}

We can now augment the data collected by Segment to build context around specific transactions taking place within the app. When Segment is used to track a particular event, we associate the features to the action when sending over the data to the Segment endpoint using Segment Properties:

Custom Properties
private fun onPurchase(context: Context, distancePValues: PValues) {
    val properties = Properties()
    properties["distance_p_min"] = distancePValues.pMin
    properties["distance_p_25"] = distancePValues.p25
    properties["distance_p_50"] = distancePValues.p50
    properties["distance_p_75"] = distancePValues.p75
    properties["distance_p_max"] = distancePValues.pMax
    Analytics.with(context).track("on_purchase", properties)
}

Training a Model

When the features reported via the customer app subsequently make their way to a Segment Destination, they can be read by the customer backend to determine a risk score using a trained machine learning model. The model in our case has been trained using a specialized dataset containing rich data supporting the swipe-to-buy interaction described above.

The training data collected over numerous human and remotely controlled sessions are labeled and fed into a Logistic Regression classifier to help provide a fraud signal. Below is a code snippet describing this process using the scikit-learn python library:

Model Training
X = data[p_values]
y = data["label"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=0
)

model = LogisticRegression(max_iter=10000)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

cnf_matrix = metrics.confusion_matrix(y_test, y_pred)
print("Confusion matrix:\n", cnf_matrix)
print("Accuracy:", metrics.accuracy_score(y_test, y_pred))
print("Precision:", metrics.precision_score(y_test, y_pred))
print("Recall:", metrics.recall_score(y_test, y_pred))

Now, when an event is reported by the customer app, we extract the features and forward them to the model that gives us a binary output via the predict() call. If the transaction is valid we can communicate to the customer app to provide a successful confirmation dialog or prompt. In case of a fraudulent transaction, the customer app can signal an error message or dialog preventing the attacker from proceeding with the transaction.

In closing, we have demonstrated the use of the Moonsense SDK alongside the Segment SDK to capture user behavioral data to help identify fraud and risk. While this blog post describes the steps at a high level, feel free to reach out via [email protected] or our contact form to discuss how this may best fit your risk management needs.