View Post [edit]
Poster: | Detective John Carter of Mars | Date: | Dec 27, 2011 3:01pm |
Forum: | web | Subject: | Re: Why does the wayback machine pay attention to robots.txt |
"The Internet Archive is not interested in preserving or offering access to Web sites or other Internet documents of persons who do not want their materials in the collection."
Reply [edit]
Poster: | PiRSquared | Date: | Sep 6, 2014 9:20pm |
Forum: | web | Subject: | Re: Why does the wayback machine pay attention to robots.txt |
Reply [edit]
Poster: | d0c5i5 | Date: | Jan 21, 2015 2:32pm |
Forum: | web | Subject: | Re: Why does the wayback machine pay attention to robots.txt |
Why hasn't this been fixed? I used to find so many things that I can't find because these domain pirates are buying up barely used/forgotten/lapsed domain names and often put in robots.txt (along with countless USELESS ads to nowhere)...
Look, I love collecting old hardware or resurrecting old hardware from countless places and doing stuff with them. Like so many many linux/GNU projects there may be few or scare references to how it was done, pieces of code, or even small downloads that are completely worthy of being preserved, but as the hardware ages (or the authors literally die), this data gets erased from history and I'm often left with links to source code/downloads/whatever refernced in forums that point to what was free/open data (even LICENSED as distributable, if GNU/GPL applies, so I doubt the new owner trying to make a buck off all the people that could end up on the domain they snached has any more claim than I do)....
Hmmm... If I were to name my kid "Disney", and disney died/forgot to fill out a form, etc, would/could I ever wipe out all of the Disney movies from history?
Reply [edit]
Poster: | PiRSquared | Date: | Jan 21, 2015 2:51pm |
Forum: | web | Subject: | Re: Why does the wayback machine pay attention to robots.txt |
Reply [edit]
Poster: | rin-q | Date: | Jan 24, 2015 6:23pm |
Forum: | web | Subject: | Re: Why does the wayback machine pay attention to robots.txt |
So the domain has been bought by a reseller, and since a robots.txt file has been added, none of the information that was available two years ago can be reached via the Wayback Machine.
So a good example website would be obakemono dot com.
A big loss for those interested in Japanese folklore, sadly.
Reply [edit]
Poster: | PiRSquared | Date: | Mar 12, 2018 10:39am |
Forum: | web | Subject: | Re: Why does the wayback machine pay attention to robots.txt |
This post was modified by PiRSquared on 2018-03-12 17:39:04
Reply [edit]
Poster: | billybiscuits | Date: | Oct 22, 2017 2:47pm |
Forum: | web | Subject: | Re: Why does the wayback machine pay attention to robots.txt |
Reply [edit]
Poster: | rin-q | Date: | Jan 27, 2015 7:06pm |
Forum: | web | Subject: | Re: Why does the wayback machine pay attention to robots.txt |
This post was modified by rin-q on 2015-01-28 03:06:38
Reply [edit]
Poster: | PiRSquared | Date: | Jan 27, 2015 7:33pm |
Forum: | web | Subject: | Re: Why does the wayback machine pay attention to robots.txt |
Reply [edit]
Poster: | rin-q | Date: | Jan 28, 2015 10:03am |
Forum: | web | Subject: | Re: Why does the wayback machine pay attention to robots.txt |
This post was modified by rin-q on 2015-01-28 18:00:57
This post was modified by rin-q on 2015-01-28 18:03:35
Reply [edit]
Poster: | d0c5i5 | Date: | Feb 21, 2015 1:26pm |
Forum: | web | Subject: | Re: Why does the wayback machine pay attention to robots.txt |
Regarding how this should be handled, imho, is that robots.txt should only be honored at crawl time. Period. (Esp if they didn't include the robots.txt back on the crawled date)
If someone wants to remove OLD data for a domain they now own AND they owned in the past, then they should do the leg work. Archive.org could offer a service where if you provide specific proof of ownership, possibly a legitimate claim for why it should be removed, and perhaps a fee to pay a trusted 3rd party to evaluate your request, then and only then, should they consider removing the records.
I just think about this, and fast forward 50 years, and they amount of both unintentional and intentional censorship that will happen, and it makes me sad. I know we are moving into the future, but I think archive.org is one of the shining examples of why the past matters, and it shouldn't be wiped away without a reason.
my 2c,
d0c