Project Description: Online Grocery Store (analogous to WebVan)

This is an example of an initial project description. It is not very detailed -- certainly doesn't get to the level of ER diagrams or anything like that. But it is supposed to reflect some thought about requirements for such a site in the real world. For the WebVan site described here, these include issues of inventory control, scheduling deliveries, and unexpected failures in the environment. Similar considerations would apply for an auction or airline reservation system site.

1. Introduction

This is the website of a high-volume online grocery store. Customers shop over the Internet using a web browser. They order groceries, paying by credit card or perhaps PayPal(tm). They specify a time window for delivery, and the groceries are delivered to their homes.

The system's basic functions are:

These are outlined below.

2. Data

Here is a high-level description of the data maintained by the site. This is not (yet) at the level of detail of an E-R diagram or relational schema.

The product catalog

This is a catalog of all items stocked by the grocer. It is fairly large (hundreds of thousands of items). It includes at least UPC, generic name (“peanut butter”), brand name (“Skippy”), size, and price for every item stocked. Note: produce, bulk goods, deli goods, etc. may require special treatment.

The catalog will need to be indexed to allow searches by UPC, name and brand, as well as by a number of general categories like “produce,” “dairy,” or “Mexican food.”

The catalog includes metadata such as shelf-life, refrigerated/frozen, average sales volume, and supplier information. Such metadata is not generally available to customers.

Quantity on hand

This is maintained by customer purchase and supply chain applications. It will require careful design to avoid lock contention among purchasers.

Suppliers and reordering

The system maintains data to support reordering of items when stock is low. This is envisioned as a workflow system, with triggers to generate reorder requests when customer transactions detect low quantity-on-hand.

We need sufficient information about each supplier to enable orders to be placed. This is effectively an adapter to the supplier's server (if the supplier is online) or to a business process.

Customer data

This is a database containing userid and password, billing and default delivery information, and personalization data such as a customer-modifiable “default shopping list.”

The customer data also includes hidden information like data-mined user preferences and purchase frequencies, in a format TBD.

At the beginning of a customer transaction, her default shopping list is initialized using information such as her data-mined preferences and how long it has been since her last purchase.

Order Data

This consists at least of orders that have been placed and not yet filled and delivered. There could be some historical data as well, but orders are eventually archived or deleted.

Scheduling Data - I

There is a fairly static database like MapQuest(tm), which the system can use to schedule routes for delivery vans given a delivery address and a time window. There is also a dynamic database which is the current schedule for each delivery van.

Scheduling Data - II

This is a fairly static database that is used to direct the people/robots that fill orders in the warehouse. For example, where are kumkuats kept relative to the loading stations?

Data Warehouse

A data base of (post-processed) clickstreams of customer visits. Data mining algorithms are run on this to infer customer behavior patterns and preferences; the inferred preferences are made available to personalize the customer user interface.

Announcements

There is a stream of interesting events or announcements, used by the personalization component to make recommendations or special offers to customers. It includes things like new product announcements and special sales.

3. Operations

User login, preference maintenance

There is nothing unusual about user management. Secure site password maintenance and a secure implementation for credit card information are required. Cookies can be used to identify users automatically. There is a user interface for setting and updating user profiles.

Personalized behavior includes one or more default shopping lists, and a “special offers” or “recommendations” mechanism that can use customer data generated by the data mining component described later.

Placing Orders

A purchase involves the user specifying a delivery time, then adding entries into a grocery list / shopping cart until she is satisfied with the proposed purchase, and finally clicking "buy". When the system acknowledges a purchase request, that is viewed as a promise–it is very bad (from a customer relations point of view) for the system to accept a purchase request and then fail to deliver all the goods in the specified time window.

Ideally the system should check inventory incrementally. That is, when the user clicks “buy” the system should not respond with a messaqge like “sorry, the following 3 items are out of stock: ...” This test should have been performed when the items were added to the shopping cart. Note there is a special case here when the session starts with a default shopping list that has a number of items already selected.

For scalability reasons a purchase cannot hold locks on the “quantity on hand” records for every item in its shopping cart until the purchase is finalized. This might not affect the performance of a single transaction running alone, but it would tend to serialize concurrent transactions behind locks for frequently purchased items (milk, bread, eggs, Rice Krispies, ...) limiting system throughput. To avoid this behavior, a purchase is structured as a sequence of transactions: each time an item is placed in the shopping cart, the quantity on hand for that item is decremented. This reduces lock contention, and guarantees that the items will be available when the user clicks “buy”. If a user abandons a purchase, the system must run a sequence of “compensating transactions” to move items from the shopping cart back into inventory. Since the compensating transactions involve adding items to inventory, they arguably should never fail.

When the transaction decrements a quantity on hand value, it may notice that inventory is low and enqueue a message to the inventory server. The decision when to send such a message involves several parameters including the shelf life of the product, its expected sales volume, and estimated worst-case delivery time.

Scheduling - I

Compute routes and assignment of orders to delivery vans. We choose to do this synchronously at the beginning of a purchase, so the customer knows she can get delivery in an acceptable time window before she invests time in creating an order. (Note this implies compensating transactions are required when orders are canceled). One output of this process is a schedule of when completed orders are expected to be available at the loading dock to be put on the van.

Scheduling - II

This process schedules filling shopping carts and delivering them to the loading dock. There are several possible models for this. A tentative simple one generates a worklist for each loading station in the warehouse. A typical entry:
11:33 - Order number 4278.
Add 2 UPC=xxxxxxx, 1 UPC=yyyyyyyy
Forward to station 11.
There are multiple paths through the warehouse, orders are filled concurrently, and the schedule must ensure all the shopping carts for a given delivery van arrive at the loading dock at the time the van is supposed to be loaded.

Reordering

This service receives “inventory low” messagees from the shoppers and sits at the head of a workflow chain that reorders stock.

There is room for clever algorithms here. Orders should presumably be batched. Factors such as price, promised delivery date, etc., influence the choice of a supplier. This choice could eventually be Web Services (WSDL) based and involve negotiation with multiple suppliers.

Gathering Clickstreams and Data Mining

This involves several services. Important user actions are sent to a service that inserts them into a data warehouse. A data mining process infers user preferences from this data. Eventually these are added to user data, and thus made available to the personalization component of session initiation. Transaction/recovery requirements for this data are minimal.

3. Errors

In a real world version of this site we need to address error conditions. Several that come to mind:

Van Schedules: A van may fall behind schedule because of traffic, or a flat tire, or some other unpredictable event.

Undeliverable: A customer may not be home when the van arrives. You can't leave the order on the front porch, especially if it contains perishable items. Some items are perishable and probably should not be returned to stock. Canned good could be returned to stock, at the expense of complexity in warehouse scheduling and inventory control.

Supplier Failures and “breakage”: A supplier may promise to deliver by a certain date and fail to do so. Suppose a customer placed an order for something we didn't have in stock, but which was promised to us before the customer's requested delivery time? Sometimes produce spoils before it is expected to (e.g. it was allowed to get warm during shipping). In both cases the effect is similar: we do not have the item that was promised to the customer. We need well-defined procedures for dealing with this.