The Replay Pattern

The problem comes from the following story: You have an application that stores incoming data in a database. However someday you have noticed that your database schema is not quite well-designed or there was a problem with your computation all the time which you have just noticed. So how exactly you can migrate your existing data to your new database schema or calculate something with your fixed computation logic?

The answer is replaying the logs. What I would sug logic gest is, for every user action that affects your database, you should have a separate “log table” or “log file” which you can parse and replay very easily. It can help you to migrate our database or recover from serious application logic mistakes. I would call this an architectural pattern.

Let’s see how it may be helpful in a few cases:

# Wrong recipient bug

You have just launched a social network, like 10 minutes ago. Everybody logs in with their Facebook account and can add their Facebook friends as friends on your social network. Sounds cool. But let’s say you have just appeared on TechCrunch and now you have 10,000 accounts and after 10 minutes you have discovered that there is a huge bug:

Every 5th “add as friend” operation adds a random stranger on the system as a friend regardless of the request parameters. Now you’re doomed, right? Let’s say everybody has 5 friends on the average and you have around 50,000 “friend of” records in your database, and guess what, 10,000 of them is just wrong data. People are about to get pissed of.

If you would have a replay logs table probably like this:

user_actionuserparams add_friend 1 {facebook_id: 1129778324} add_friend 1 {local_id: 4} invite_friend 2 {facebook_id: 1129778324} add_friend 3 {facebook_id: 249584308} Now you can replay these logs as follows: First you will fix your “5th add friend” bug. Delete all records of “friend of” data on the database. Replay all “add_friend” actions in the database and ignore “invite_friend” actions because there’s nothing wrong with it and more importantly, we don’t want to send invitations twice.

# Mispricing problem

If you are offering a service based on lets say unit cost per time unit (e.g. server price per hour) and changed the unit cost someday. However, because of some bug in your pricing code, the cost change is not reflected to the bill and you figured out something is wrong before billing the user. If you can store all the data regarding consumption by user along with a timestamp, you can always recalculate the bill with the correct unit cost.

Of course this is a serious and big issue, many companies have just have “billing” and “payment collection” departments with tens of employees working full-time. However I just wanted to introduce the concept.

# Limitations

There is no silver bullet. Replaying from the logs may not be always the savior. First thing is, it comes at a cost, like storing at least twice more database size. For instance for the mispricing problem, you should have logged all the user consumption, ideally a record per unit time. This is huge and difficult to scale if you have lots of users.

Second thing is you cannot always replay everything. In the “Wrong receipent bug” we have seen that invite_friend action was not problematic, however if you would like to replay it, the invitee will be receiving the invitation twice, which is really annoying and an undesirable situation. For instance e-mail notifications you have sent, any file you have deleted in a remote server cannot be undone and therefore should not be replayed.

Third thing is you need to log all the actions that affects the entities you might want to replay later. This is not quite easy. Let’s see why. Let’s say you have keeping track of a blog:

user_action params new_post {title: ‘hello world’, author_id: 1} new_comment {body: ‘good luck’, author_name: ‘John’, post_id: 5} Looks cool, however how can you associate this new comment with the post? Maybe you should store post_id in new_post action, but how do you know the new database schema that you are going to migrate is going to allow modifying auto-increment fields? Maybe your new database is a document-database and you will not store id fields at all. It is not impossible but requires extra code in your replay logic and more cautious coding.

Fourth thing is, lack of transaction information. Let’s say your does T1, and your system instantiates a transaction which does [T2, T3, T4] respectively and then commits the transaction. From outside, it looks like just [T1,T2,T3,T4]. Let’s say while you are replaying T3 failed and you are supposed to rollback and not to do T4 at all. What if T1 and T2 are undo-able? Okay, this issue a well-known contemporary computer science issue and called transaction processing.

Just wanted to leave this here. :) Comments are highly appreciated if you have something to discuss.

P.S. I have just made up the name “replay pattern”, there is no such thing.