The seed for Wide00014 was: - Slash pages from every domain on the web: -- a list of domains using Survey crawl seeds -- a list of domains using Wide00012 web graph -- a list of domains using Wide00013 web graph - Top ranked pages (up to a max of 100) from every linked-to domain using the Wide00012 inter-domain navigational link graph -- a ranking of all URLs that have more than one incoming inter-domain link (rank was determined by number of incoming links using Wide00012 inter domain links)...
Wide17 was seeded with the "Total Domains" list of 256,796,456 URLs provided by Domains Index on June 26th, and crawled with max-hops set to "3" and de-duplication set "on".
Web wide crawl number 16 The seed list for Wide00016 was made from the join of the top 1 million domains from CISCO and the top 1 million domains from Alexa.
Web wide crawl with initial seedlist and crawler configuration from June 2014.
Web wide crawl with initial seedlist and crawler configuration from January 2015.
Web wide crawl with initial seedlist and crawler configuration from April 2013.
Web wide crawl with initial seedlist and crawler configuration from August 2013.
Web wide crawl with initial seedlist and crawler configuration from January 2012 using HQ software.
Web wide crawl with initial seedlist and crawler configuration from April 2012.
Web wide crawl with initial seedlist and crawler configuration from February 2014.
Web wide crawl with initial seedlist and crawler configuration from October 2010
Screen captures of hosts discovered during wide crawls. This data is currently not publicly accessible.
Web wide crawl with initial seedlist and crawler configuration from March 2011 using HQ software.
Web wide crawl with initial seedlist and crawler configuration from September 2012.
Wide crawls of the Internet conducted by Internet Archive. Access to content is restricted. Please visit the Wayback Machine to explore archived web sites.
Web wide crawl with initial seedlist and crawler configuration from March 2011. This uses the new HQ software for distributed crawling by Kenji Nagahashi. What’s in the data set: Crawl start date: 09 March, 2011 Crawl end date: 23 December, 2011 Number of captures: 2,713,676,341 Number of unique URLs: 2,273,840,159 Number of hosts: 29,032,069 The seed list for this crawl was a list of Alexa’s top 1 million web sites, retrieved close to the crawl start date. We used Heritrix (3.1.1-SNAPSHOT)...
1.7M
1.7M
Apr 9, 2016
04/16
by
Internet Archive
web
eye 1.7M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl830.us.archive.org:widewebcap from Mon Mar 21 16:55:53 PDT 2016 to Fri Apr 8 20:48:18 PDT 2016.
Topic: crawldata
Web wide crawl with initial seedlist and crawler configuration from September 2010
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Sun Oct 15 11:28:54 PDT 2017 to Sun Oct 15 06:00:00 PDT 2017.
Topic: crawldata
426,496
426K
Aug 13, 2021
08/21
by
Internet Archive
web
eye 426,496
favorite 0
comment 0
Internet Archive crawl data from the wide crawl number 18, captured by crawl808.us.archive.org:wide18 from Thu Aug 12 22:36:46 PDT 2021 to Thu Aug 12 16:30:58 PDT 2021.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl805.us.archive.org:wide from Mon May 13 17:55:38 PDT 2019 to Mon May 13 13:45:55 PDT 2019.
favoritefavoritefavoritefavoritefavorite ( 8 reviews )
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl427.us.archive.org:wide from Sun Oct 16 02:25:38 PDT 2016 to Sat Oct 15 21:19:02 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl453.us.archive.org:wide from Wed Feb 19 01:09:37 PST 2014 to Tue Feb 18 21:33:27 PST 2014.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Wed Feb 19 07:58:38 PST 2014 to Wed Feb 19 05:13:46 PST 2014.
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl420.us.archive.org:wide from Tue Feb 18 17:01:58 PST 2014 to Tue Feb 18 13:14:06 PST 2014.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl427.us.archive.org:wide from Wed Feb 19 09:49:01 PST 2014 to Wed Feb 19 06:07:15 PST 2014.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl454.us.archive.org:wide from Wed Feb 19 05:20:19 PST 2014 to Wed Feb 19 01:54:33 PST 2014.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Wed Jun 29 15:44:55 PDT 2016 to Wed Jun 29 10:43:30 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl813.us.archive.org:wide from Wed Jun 29 17:31:53 PDT 2016 to Wed Jun 29 15:11:43 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl835.us.archive.org:wide from Thu Jun 30 10:52:08 PDT 2016 to Thu Jun 30 06:07:26 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl422.us.archive.org:wide from Fri Aug 31 21:14:14 PDT 2018 to Fri Aug 31 15:52:39 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl808.us.archive.org:wide from Fri Aug 31 21:39:50 PDT 2018 to Fri Aug 31 16:36:25 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Fri Aug 31 22:46:26 PDT 2018 to Fri Aug 31 17:45:42 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl825.us.archive.org:wide from Thu Apr 7 11:34:12 PDT 2016 to Thu Apr 7 06:58:33 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl803.us.archive.org:wide from Thu Apr 7 12:24:26 PDT 2016 to Thu Apr 7 08:51:27 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Thu Apr 7 12:21:53 PDT 2016 to Thu Apr 7 07:27:22 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl812.us.archive.org:wide from Thu Apr 7 12:21:12 PDT 2016 to Thu Apr 7 07:19:38 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl421.us.archive.org:wide from Thu Apr 7 12:12:12 PDT 2016 to Thu Apr 7 07:26:33 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Thu Apr 7 15:00:19 PDT 2016 to Thu Apr 7 09:30:57 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl812.us.archive.org:wide from Thu Apr 7 17:22:05 PDT 2016 to Thu Apr 7 13:08:49 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl412.us.archive.org:wide from Wed Jun 25 19:02:30 PDT 2014 to Wed Jun 25 14:54:55 PDT 2014.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl422.us.archive.org:wide from Tue Feb 18 06:45:58 PST 2014 to Tue Feb 18 01:16:03 PST 2014.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Mon Jun 20 07:53:17 PDT 2016 to Mon Jun 20 02:01:55 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl806.us.archive.org:wide from Tue Oct 9 21:09:32 PDT 2018 to Tue Oct 9 18:23:47 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl800.us.archive.org:wide from Wed Jun 25 03:46:57 PDT 2014 to Tue Jun 24 23:57:54 PDT 2014.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl805.us.archive.org:wide from Sun Sep 30 07:26:37 PDT 2018 to Sun Sep 30 03:22:20 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Thu Jun 26 11:09:31 PDT 2014 to Thu Jun 26 06:10:17 PDT 2014.
Topic: crawldata
2.2M
2.2M
Feb 1, 2012
02/12
by
Internet Archive
web
eye 2.2M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl431.us.archive.org:widewebcap from Tue Jan 17 22:09:27 UTC 2012 to Wed Feb 1 11:19:58 UTC 2012.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Wed Jun 25 03:48:33 PDT 2014 to Tue Jun 24 23:42:02 PDT 2014.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl425.us.archive.org:wide from Sat Sep 1 05:24:22 PDT 2018 to Fri Aug 31 23:58:45 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl815.us.archive.org:wide from Thu Mar 24 13:02:27 PDT 2016 to Thu Mar 24 07:48:54 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Wed Jun 25 08:33:55 PDT 2014 to Wed Jun 25 05:14:42 PDT 2014.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl802.us.archive.org:wide from Mon Aug 6 09:28:15 PDT 2018 to Wed Aug 8 18:02:43 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl423.us.archive.org:wide from Sat Aug 18 11:17:19 PDT 2018 to Sat Aug 18 07:39:16 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl451.us.archive.org:wide from Wed Jun 25 03:48:51 PDT 2014 to Tue Jun 24 23:52:55 PDT 2014.
Topic: crawldata
1M
1.0M
Nov 14, 2016
11/16
by
Internet Archive
web
eye 1M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl834.us.archive.org:widewebcap from Fri Oct 21 01:30:25 PDT 2016 to Mon Nov 14 13:38:53 PST 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl339.us.archive.org:wide from Wed Jun 25 03:47:13 PDT 2014 to Tue Jun 24 23:59:14 PDT 2014.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl800.us.archive.org:wide from Sun Aug 5 03:33:48 PDT 2018 to Tue Aug 7 01:48:53 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Wed Jun 25 22:28:40 PDT 2014 to Wed Jun 25 17:31:08 PDT 2014.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl451.us.archive.org:wide from Wed Jun 25 11:53:45 PDT 2014 to Wed Jun 25 09:06:24 PDT 2014.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl425.us.archive.org:wide from Sun Aug 5 03:16:34 PDT 2018 to Mon Aug 6 16:55:41 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl812.us.archive.org:wide from Sun Aug 5 05:36:11 PDT 2018 to Mon Aug 6 06:38:33 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl802.us.archive.org:wide from Fri Oct 21 18:47:16 PDT 2016 to Fri Oct 21 13:12:51 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl807.us.archive.org:wide from Sun Aug 5 04:40:59 PDT 2018 to Tue Aug 7 09:40:31 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl417.us.archive.org:wide from Thu Dec 1 10:17:20 PST 2011 to Thu Dec 1 04:31:54 PST 2011.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl339.us.archive.org:wide from Wed Jun 25 08:00:10 PDT 2014 to Wed Jun 25 02:58:26 PDT 2014.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl337.us.archive.org:wide from Thu May 23 12:04:46 PDT 2013 to Thu May 23 06:33:48 PDT 2013.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl427.us.archive.org:wide from Fri Feb 24 17:19:34 PST 2017 to Fri Feb 24 11:15:22 PST 2017.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl809.us.archive.org:wide from Fri Feb 24 23:48:27 PST 2017 to Fri Feb 24 18:37:03 PST 2017.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl423.us.archive.org:wide from Fri Feb 24 19:48:55 PST 2017 to Fri Feb 24 13:38:39 PST 2017.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl809.us.archive.org:wide from Fri Feb 24 17:39:15 PST 2017 to Fri Feb 24 11:38:00 PST 2017.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl428.us.archive.org:wide from Tue Jun 6 13:09:16 PDT 2017 to Tue Jun 6 07:48:12 PDT 2017.
Topic: crawldata