Skip to main content

View Post [edit]

Poster: PiRSquared Date: Jan 27, 2015 7:33pm
Forum: web Subject: Re: Why does the wayback machine pay attention to robots.txt

Is whois data archived?

Reply [edit]

Poster: rin-q Date: Jan 28, 2015 10:03am
Forum: web Subject: Re: Why does the wayback machine pay attention to robots.txt

Well. I am aware of at least one service that, while I haven't personally tried it, provides whois record history. Domaintools being that service, they claim to have archived whois records since 1995 and one can gain access to these for a monthly fee. Now, I wouldn't know wether the Internet Archive has such records (I can only hope so), but another way to, at least partially, check wether or not to respect the robots.txt would be to firstly ignore it and do sort of an integrity check with the last archived content and the current one. If the content is too different, then the robots.txt file should be ignored for already archived content, but not newer one. Obviously, this probably wouldn't work for every cases, but that'd still be a better way to go, if you asked me. Or the robots.txt file could simply prevent new crawls while still allowing visitor access to already crawled content. The current situation feels like a library making a book for consultation only, then erasing the past borrowers memories of the book because of the new consultation only policy. I mean, the data is still there (as you've shown me earlier), why not just allow access to it?
This post was modified by rin-q on 2015-01-28 18:00:57
This post was modified by rin-q on 2015-01-28 18:03:35