Web Butler : Abstract

The World Wide Web has quickly become one of the easiest and most popular forms of retrieving information. Keeping up with such a dynamic environment has become a major problem: every day thousands of new pages are added, page contents are changed, pages are moved from one site to another, and pages are deleted. The Web Butler project is an infrastructure for monitoring these dynamic changes in hyperlinked environments. By periodically spidering a set of site URLs that users have registered, the Web Butler collects different statistics about incoming and outgoing links, whether the links are internal or external, and the size of pages. The Web Butler can then directly report when links are broken or when pages have changed, which is a useful service to Web site administrators. One side effect with research implications is that the data the Web Butler collects will provide insight into how hyperlinked environments change over time. Additionally, the Web Butler project will integrate many of the latest technologies such as WAP content and new data mining techniques.