Thursday, January 26, 2006
4:15 pm
B17 Upson Hall

Computer Science
Spring 2006

Dr. C. Lee Giles

David Reese Professor, School of Information Sciences and Technology
Professor, Computer Science and Engineering
Professor, Supply Chain and Information Systems
The Pennsylvania State University


Next Generation CiteSeer: CiteSeerx


CiteSeer, a public online computer and information science search engine and digital library, was a radical departure from the traditional methods of academic and scientific document access and analysis. CiteSeer, now hosted at Penn State, has over 700,000 documents and has become a popular academic document search engine in science. The current CiteSeer model, with some difficulty, is also portable and was recently extended to academic business documents (SMEALSearch). CiteSeer is based on these features: actively acquiring new documents, automatic citation indexing, and automatic linking of citations and documents. The new Google Scholar does similar citation indexing and linking. Why has CiteSeer been so popular and how should it progress? We discuss this and the Next Generation CiteSeer project, which will emphasize CiteSeer as a research tool, research service and researcher facilitator.  It will explore new intelligent algorithms for providing improved and new indexes, enhanced document access, expanded and automatic document gathering, collaboratories, new data and metadata resources, active mirroring, and web services. As example, we discuss our new work on automatic acknowledgement indexing, which provides insight into the impact of acknowledged individuals, funding agencies and others.