On this lecture it is important not to go too slowly. The instructor must touch on the many topics covered here without bogging down, since our goal is to work down to fundamental questions. A slow and detailed exposition would spend 2 weeks on this one slide set and yet would be almost entirely superficial! You might consider trimming slides if you don't like my pace on the first parts of the lecture, where I tend to really blast through the material.
The lecture starts by asking "what really happens" when a program first tries to find and connect with a Web Service. The point is that with this architecture one often has many options for a desired service.
In this lecture hall, I want my slides to project on the overhead projector. But the room is full of devices that might support that interface, even including cell telephones, other laptops, the projector just on the other side of the wall in the next room (perhaps it can "hear" me), etc. How will my computer wisely pick the right device?
The main topic here is discovery: naming. My objective is for the students to realize that there are many "layers" at which name resolution occurs: the URLs in a web page, which are often generated on demand just for the client; the DNS mapping of names to IP addresses, the interpretation of IP addresses, the routing infrastructure, load-balancers, etc. Potentially logic is needed both on the client side of each of these layers and on the server side.
Yet Web Services has a rather inadequate naming mechanism; services describe themselves using UDDI. The lecture spends a few minutes looking at a WSDL description and a UDDI database to illustrate the ideas. Beyond this we can hack solutions, like the DNS mechanisms just mentioned, but there is no uniformity and poor platform support for such things.
The key thing to explain to the students is that unlike prior architectures like CORBA, which emerged as ground-up designs covering "all the issues" for a style of client-server computing, Web Services is designed for a maximum of compatibility with web sites and the way they serve up documents.
One consequence is that if we look at a hard problem like service discovery, we often need to think about it not as a question all by itself but rather as a generalization of some thing web sites do. The service discovery problem is about linking a client application on some machine to a service it wants to get some response out of, perhaps the Amazon.com product popularity service. But Amazon has many data centers and many nodes and this service might run on multiple ones, it might have varying numbers of nodes, etc. Discovery – binding the client to the “right one” – is hard.
Making it harder is that web services treats questions like this as "analogous" to other questions such as the one of finding the right service to get a page from in a large web center hosting content -- a CDN. When a client accesses a CDN to download a web page or picture, the request needs to be directed to the right place. This is a lot like ending a request from a client to a data center which has a representative of that service running on it.
To look at the mechanisms that enter when a CDN serves up a page we looked at how Akamai serves pages. Akamai partners with cnn.com to host pictures by caching their content. Clients fetching a page from CNN get one with rewritten URLs that point into Akamai, and then when you render the page, those pictures are copied out of the Akamai content service instead of directly from CNN (this is sometimes slower, as seen in the study at the end of the slide set, but is usually more reliable).
We saw that there are many levels at which Akamai needs control. It needs to worry about load balancing requests to "akamai.com" and does this partly by handing out fake sub-domain names like g.1234.akamai.com, and partly by having many servers for the root of each of those subdomains. Then there is an internal level of routing associated with categories of content and sizes of objects. Akamai tries to route a client’s request to a “nearby” content server. Finally you get down to the level of clusters of nodes that replicate the content and can return copies to that client system.
In working with web services, what we run into is that same structure, but now instead of worrying only about serving up web pages like Akamai does, we are trying to send each RPC request to the "right" server to handle it. But a weakness of the WS architecture is that many of the needed features are simply missing from the overall design. Hari Balakrishnan (MIT) has published on this problem, in fact. Moreover, things we knew how to do in past architectures might not solve the same problems in web services. For example, CORBA styles of solution probably would be rejected by the WWW community because anything that one adds to Web Services always tries to parallel the ways that documents can be fetched from complex data centers or web sites.
Later in CS514 we will be focused on small, clear, self-contained mechanisms such as "a protocol for replicating data for fault-tolerance". But we need to always keep in mind that any mechanisms we explore would also need to be used in web services settings. And not everything is permitted -- to convince the WWW consortium, a mechanism needs to fit their expectations, which means it needs to be as parallel to some web site mechanism as possible!
This is why Hari Balikrishnan and some colleagues are arguing for a new multi-layer naming and discovery architecture. Presumably, if the architecture was standard we would also have standard ways to supply the needed logic in order to specialize a given "path". The steps in his multi-level architecture are very parallel to the way that we talk to web sites about documents and this increases the chances that the WWW community will consider his proposal.
For example, the client may have a choice: buy the books for CS514 from Amazon, Barnes and Noble, or directly from Springer. Having picked the supplier, perhaps Amazon, Amazon may want this request to go to its New Jersey warehouse; within that warehouse we may want the transaction to hit the right services, the right subgroup within that service, the right member. And then we need to deal with routing... Quite a mess, yet this is the world in which we live today. If you work in distributed systems, it doesn't get better than this!
Take-aways? Web Services only go "so far" and often, real developers simply must go beyond the limits. Such is the case for naming, and it will happen for other technologies too. We'll see this again and again.