Improve a conversion rate with TrailDB, TensorFlow and Golang

6 min readJun 22, 2018

During my free time, I manage different sides projects. One of them, and the oldest one, is ForumForAll.com. A forum hosting service. I’ve create it in 2003, when nobody have ever heard the “SAAS” terms.

It’s still open and in use, hosting more than 100k forums, totaling more than 6 million messages by something like 500k users. It’s powered by php 7.1, Symfony 3.x, Redis, Socket.io, nodeJs, MySQL and secrets recipes. But the metrics I really enjoy from this project, are the respond time and the cost of the infrastructure: 15.99€/month.

But let’s face it, there’s something strange it the previous metrics: the number of users per forum. And you have to know that some forums have more than 7k users. Minimal user by forum is one, so it make a looooooot of forum with the minimal user, which are basically unused forum.

In fact, I need to have a lot of new forums each day to get a really active new one. The brand website must have a great conversion rate to don’t miss any opportunity. So, I search new ideas to improve this conversion rate.

Depending what the user is actually doing on the website, if I could guess what are the conversion probability, I may help the user by pushing some events.

I’ve look for solution, without any technical constraint on the actual stack, software or personal knowledge and experience.

I’ve imagine a solution in 3 steps:
— Record what user are doing
— Use Machine learning to process this data
— Request the predictive result to display or not help to user

Record what user are doing

If I wan’t to know what user are going to do or not, I refer myself to this great african proverb:

When you don’t know where you are going, look back at where you came from.

Conversion rate must be re-evaluate at every user action, this is called an event, and together, they are the event data.

TrailDB is an awesome database engine, designed specially for event data. This job could be done correctly by usual SQL database like MySql, Postgres or MariaDB, but TrailDB will do it better for severals reasons.

TrailDB is fast (because it ensure immutable data), have an efficient compression system, is scalable, provide AWS integration and many language API. It fit perfectly with my requirement!

I’ve want to implement a solution that don’t require to add complexity to the current project, like creating a php http client to send event in TrailDb from the Symfony Project. So my first idea was to create a proxy with GoLang, and finally I decided to take advantage of the “post_action” in Nginx, by duplicating the user request to a TrailDB server in Go. This way, it doesn’t impact reponse if there’s a latency or error during the record of the event.

Here’s what the nginx config look like:

I really love GoLang for many reasons, one of them is how you can easily do everything with it. So here I need a http server to handle user request and query TrailDB. Importing “C” package allow us to hydrate TrailDB directly, and it’s already available by the TrailDB API.

I just need a http server (binding port 7000), capable to recognize the url requested from a yaml file (with a url pattern and a value), identify the current user (by this ip for example), and create an event storing the current user action.

At this point, I can store users trails quickly without impacting my Symfony project. There’s on trail per user, composed of events. Each event have a datetime, a user hash (md5 of user ip) and a “string” action (page name). Now I need to extract this data to a csv file before processing it.

For this post, the data output is very basic, but we can add time spent on each action, localization, timezone, referrer or other informations to have more accurate evaluation. Actually, the export look like this. Each line is a user trail, and I get 10 000 trail in few weeks.

Transformation boolean and user navigation

Use Machine learning to process this data

I’ve choose to use TensorFlow to compute this data. Main reason is the existant ecosystem about it. GPU work, Google Could hosting or API with it. Here are some parts of the process.

First step is to sanitize this data: filling empty data, map string value to integer.

train_df = train_df.fillna("null")
page_mapping = {"index": 1, "registration_ok": 2, "feature": 3, "extra_features": 4, "registration_form": 5, "whoweare": 6, "mentions": 7, "pricing": 8, "null": 0}
train_df['page1'] = pd.to_numeric(train_df["page1"].map( page_mapping ).astype(int))

Next step, very common in ML science: sampling the data. 80% of the file is used to train the model. The 20% remaining is used to evaluate the data, removing the “transformation” value.

# sampling 80% for train data
train_set = train_df.sample(frac=0.2, replace=False, random_state=777)
# the other 20% is reserverd for cross validation
cv_set = train_df.loc[ set(train_df.index) - set(train_set.index)]

Then, I apply a Deep Neural Network Classifier to classify trails. For the fun, I also display it as a decision tree. It show that some pages drastically dropped the conversion rate when they are present in the trail.

estimator = tf.estimator.DNNClassifier(
    feature_columns = feature_columns,
    hidden_units = [1024, 512, 256],
    n_classes = 8,
    optimizer = tf.train.AdamOptimizer()
)

estimator.train(input_fn=train_input_fn)

Now let’s evaluate result:

estimator.evaluate(input_fn=eval_input_fn){'accuracy': 0.85961539,
 'average_loss': 0.55103552,
 'global_step': 888,
 'loss': 57.307697}

After few iteration, I can have an effective and well trained estimator, so I can save this work using the SaveModel method of the TensorFlow estimator.

builder = tf.saved_model.builder.SavedModelBuilder("./model")
builder.save()

Request the predictive result

Now I need to query what is the predictive new result each time the user does a new action on the brand website. This work with a php http client to a GoLang http server returning TensorFlow trained model result. In fact I could directly call the previous TrailDB http server with the user IP to have a predictive result, but it will resolve in many read access…

I’ve choose to use “tfgo” Tensorflow in Go to retrieve predictive result.

galeone/tfgo

tfgo - Tensorflow + Go, the gopher way

github.com

Important part here, is to query the predictive http server using the same data mapping and santization done to the csv export. So it look like this:

The web server return the prediction value in a Json response. This prediction is now used to display or not a help modal for the user.

The results are encouraging and I’ve to fix some special case like useless prediction without a minimal number of page browsed by the user.

Thanks reading and feel free to share your idea about this.

Follow me on twitter @scullwm