Poster: Rob Adams Date: Apr 15, 2003 6:07am
Forum: researchproposals Subject: Research Proposal from Rob Adams

Project name: Research into web link dynamics and its relationship to the seach problem
Abstract: We are studying the dynamics of the link structure of the web, and in particular ways to take advantage of that link structure for the purposes of constructing improved ranking metrics for web searches. In order to validate the ranking metric and the models on which it is based, historic web crawl data is required.
Description: We have developed a new ranking metric we call the Quality of a page which is based on the way the link structure of the web changes over time, and attempts to estimate perceived quality of the page in a more direct manner than PageRank. In the development of this metric, we were forced to make a number of assumptions and approximations. In order to validate these assumptions as well as the metric itself, we need access to historic web crawls. We will use this historic data to calculate past page quality values to determine whether the high quality pages acheive high importance in later crawls as predicted by our model.

In order to do this, we would require access to fairly old crawl data as well as to fairly recent crawl data. Though we will not need to look at more than a small subset of the total size of the Internet Archive database, we would require access to the fill archive so that we can get the historical context that our research requires.

