22 July 2013

Versioning RESTful Services

As I was explaining Representational State Transfer (RESTful) Service Oriented Architecture (SOA) to my customer, who require extensive modernization of an enterprise class system, they asked a very good question:
"Things change over time. If everything is service based, how do we handle versioning of the services, without client disruption. When one applications suddenly requires different information, we can't update all of the other client applications just to accommodate the needs of the one. We also should not be duplicating services fo every client that needs something slightly different."
Excellent question! There are myriad reasons to update a service, but only three major courses of action to follow. One of the three will be a best fit for each situation. Sometimes, things are just that easy. The options are few; because, the ultimate goal is to maintain integrity of the URLs. You simply don't want your service URLs changing, for many reasons into which I will not delve in this post. (If you are reading this post, you should know well enough to be nodding in agreement, anyway.)

To be clear, this post covers RESTful services, such as Web API, and not traditional WISDL-based Web Services or object serialization proxy services, like WCF. No, we're talking true REST.

OPTION #1 - Additive Change (Non-breaking change)


The first and most common option is to simply add more information to the response stream. This is a basic form of non-breaking change version control. Employing this methodology makes the massive assumption that the client disregards additional information that it does not already know how to handle. This is a safe assumption, when working with enterprise systems; because, you have considerable control over how the clients are developed. This is not necessarily true, when exposing services to the public, but that's a calculated risk.

Let's use the classic example of a sales system, to illustrate additive change. Version 1.0 of the service returns JSON-format order information for customer id 123:


http://api.theClientSite.com/api/order/123

[{"id":456,"date":20130721,"total":34.26,"type":32}]


The store number at which the sale took place must be added to the order information. We can simply append the store number to the end of the record:

http://api.theClientSite.com/api/order/123

[{"id":456,"date":20130721,"total":34.26,"type":32,"store":987}]

That's pretty painless, and no clients went belly-up. The URL has not changed, and all the original information is included. Quietly deploy the updated service, and then release the updated client. So, the upgrade path looks like this:
  1. Upgrade service to include new data
  2. Upgrade only the clients that need the new data
However, this doesn't always fit business needs. Eventually, some information becomes obsolete and should be removed from service response, to save transmission resources. Things just got a whole lot more invasive.

OPTION #2 - Negative Change (Breaking change)

I don't mean negative, as in my daughter's attitude towards family activities ("SO lame, dad!"), but as in the opposite of additive. Most people forget the Internet isn't free, and keeping data transmission slim should still be at the forefront of concern. Obsolete information should occasionally be trimmed from service responses. I'm not going to tell you when it's time to do so; you get to decide when your responses need to lose a few kilos. But, be aware that doing so causes pain, and you're the one who gets the blame.

Continuing the example, you are told the order type information has not been used for years, and no clients consume the information. (This is the best-case scenario!) Being a savvy developer, you know that just because the clients don't do anything with the "Type" value doesn't mean that it isn't being parsed from the response stream. In fact, removing "Type" will likely cause several clients to malfunction.

We have two upgrade paths to follow. The first is iterative, and the second simply breaks changes:
  1. Create new service at: http://api.theClientSite.com/api/orderV2/123
  2. Update clients parsers, over time
  3. Take version 1 of the service offline, after all clients are updated
This often works well, but creates the very ugly situation of creating inconsistent service paths. Plus maintenance overhead. Plus politics of killing the older version. And on and on... The politicing gets especially grueling in a corporate environment, where development resources are incredibly scarce and decisions driven by managers educated by the latest edition of whatever magazine they read. 

99% of the time, it's best to rip the BandAid off quick and fast, by issuing a decree that breaking changes will be made on a certain date, and clients had better be ready for the change. This is not uncommon. A recent example occurred when Twitter made breaking changes to its API, which flushed out countless clients that were no longer being maintained by their creators (it was a good thing!). Features of the service had evolved beyond the capabilities of the API, and things simply had to change, using this preferable model of breaking changes:
  1. Using a tempotaty URL, release a dummy service that includes the breaking changes, against which updated clients may be tested
  2. Set a release date for breaking changes, and publicize information about new service
  3. Deploy breaking service changes and take dummy service offline
For the uninitiated, those who maintain clients with rigid, brittle architecture will fight tooth and nail, to stay the pain of adaptation as long as possible. Stick to your plan, and don't budge! How you respond to these pressures determines how painful your own future will be: giving in to pressure means nobody will take you seriously in the foreseeable future, and the pressure will compound with every subsequent project.

If you think that's bad, there is a third, more abhorrent option: the map disconnect.

OPTION #3: Map Disconnect (breaking change)

The concept if this change is simple: change the meaning of values. 

Let's examine the preferable, but less reliable, mode of simply changing the semantics of the information:

http://api.theClientSite.com/api/order/123

[{"id":456,"date":20130721,"total":34.26,"store":987}]

now returns:

[{"id":456,"date":20130721,"total":654.13,"store":Yalecrest}]

Considering this is the same order (ID 456), two pieces of information obviously now mean something different. "Total" used to mean the total price of products in order 456. Now, "total" means "running total owed by this customer, on credit", and "store" is the name of the regional distribution warehouse, instead of the store number. (This may not make sense to you as a developer, but it's what the business rules require.)

The trouble with a semantic map disconnect is that a client that is not updated updated may merrily goes about parsing out this new information, resulting in the merciless trashing some poor database. Especially with public-facing services, you want to be very certain that all clients are updated; simply changing the semantics is inherently dangerous. You must take more drastic measures, to enforce client update.

OPTION #3b: 

If you're going to make breaking changes, you may as well raze the data scheme and start over with something more efficient and fitting. Re-naming all of the values ensures clients are updated, and no kludge fixes are attached with duct tape and bubble gum, with some weird voodoo shenanigans taking place that make things appear to run appropriately.

Let's assume the example is public facing and in need of drastic measures, to enforce client update:

http://api.theClientSite.com/api/order/123

[{"id":456,"date":20130721,"total":34.26,"store":987}]

now returns:

[{"id":456,"date":20130721,"runningTotalOnCredit":654.13,"regionalWarehouse":Yalecrest}]

There is no way a parser will function properly without an update; the "total" and "store" values are completely missing, and is sure to raise some sort of error. 

Which to pick? 3a or 3b?

To contrast against an internal, line-of-business service, let's see how hard it is to rename values in a mortgage accounting application:

[{"id":456,"date":20130721,"principal":736.55,"interest":386.94}]

There are a lot of Generally Accepted Accounting Principles (GAAP) that set naming standards for accounting concepts. You, as a developer, don't get to decide what these are named, and must follow the naming conventions dictated by the accounting department. "Capital gains" must always be named "capital gains"; otherwise, the accounting developer gets confused and pretty upset. In this case, the semantic disconnect is preferred and almost necessary.

Summary

Changes to RESTful services may be made with or without breaking changes. The determining factors are whether the existing schema can support additional data, if data must be removed from the schema, if the meaning of data has changed, and whether the clients are in the private (internal/corporate) or public domain. Enforcement of release dates is important!

No comments:

Post a Comment

Please provide details, when posting technical comments. If you find an error in sample code or have found bad information/misinformation in a post, please e-mail me details, so I can make corrections as quickly as possible.