Skip to main content
SHOW DETAILS
up-solid down-solid
eye
Title
Date Archived
Creator
Internet Archive Research Publication Crawls
Internet Archive Research Publication Crawls
collection
21,177
ITEMS
112.3M
VIEWS
by Internet Archive Web Group
collection

eye 112.3M

A series of open web crawls targeting journal articles, technical memos, essays, datasets, and other research publications. This collection contains WARC and CDX files that end up in Wayback ( https://web.archive.org ). See also bibliographic metadata corpuses at  https://archive.org/details/ia_biblio_metadata
OAI-PMH-CRAWL-2020-06
OAI-PMH-CRAWL-2020-06
collection
2,946
ITEMS
5.4M
VIEWS
by Internet Archive Web Group
collection

eye 5.4M

MSAG-PDF-CRAWL-2017
collection
1,855
ITEMS
12.2M
VIEWS
by Internet Archive Web Group
collection

eye 12.2M

Microsoft Academic Graph public corpus (Feb 2016) PDF URLs, filtered to remove large sites (pubmed, citeseerx, arxiv) and already-crawled URLs.
Topics: papers, journals
UNPAYWALL-PDF-CRAWL-2018-07
UNPAYWALL-PDF-CRAWL-2018-07
collection
1,241
ITEMS
15M
VIEWS
by Internet Archive Web Group
collection

eye 15M

Web archive data from a crawl of open access PDF URLs provided by Unpaywall.
OA-JOURNAL-CRAWL-2020-07
OA-JOURNAL-CRAWL-2020-07
collection
1,923
ITEMS
10.1M
VIEWS
by Internet Archive Web Group
collection

eye 10.1M

Open Access Journal Test Crawl (2018)
Open Access Journal Test Crawl (2018)
collection
794
ITEMS
11.1M
VIEWS
by Internet Archive Web Group
collection

eye 11.1M

UNPAYWALL-PDF-CRAWL-2019-04
UNPAYWALL-PDF-CRAWL-2019-04
collection
641
ITEMS
5.6M
VIEWS
by Internet Archive Web Group
collection

eye 5.6M

MAG-PDF-CRAWL-2020-03
MAG-PDF-CRAWL-2020-03
collection
489
ITEMS
3.9M
VIEWS
by Internet Archive Web Group
collection

eye 3.9M

DIRECT-OA-CRAWL-2019
DIRECT-OA-CRAWL-2019
collection
2,566
ITEMS
5.6M
VIEWS
by Internet Archive Web Group
collection

eye 5.6M

OA-DOI-CRAWL-2020-02
OA-DOI-CRAWL-2020-02
collection
278
ITEMS
3.3M
VIEWS
by Internet Archive Web Group
collection

eye 3.3M

UNPAYWALL-PDF-CRAWL-2020-03
UNPAYWALL-PDF-CRAWL-2020-03
collection
344
ITEMS
1.9M
VIEWS
by Internet Archive Web Group
collection

eye 1.9M

DATACITE-DOI-CRAWL-2020-01
DATACITE-DOI-CRAWL-2020-01
collection
1,417
ITEMS
4M
VIEWS
by Internet Archive Web Group
collection

eye 4M

JOURNALS-PATCH-CRAWL-2022-01
JOURNALS-PATCH-CRAWL-2022-01
collection
104
ITEMS
854,132
VIEWS
collection

eye 854,132

CORE-UPSTREAM-CRAWL-2018-11
CORE-UPSTREAM-CRAWL-2018-11
collection
741
ITEMS
1.9M
VIEWS
by Internet Archive Web Group
collection

eye 1.9M

Crawl of "upstream" URLs from CORE (core.ac.uk) metadata dump. Only a partial seedlist of files crawled.
DOI-LANDING-CRAWL-2018-06
DOI-LANDING-CRAWL-2018-06
collection
279
ITEMS
3.4M
VIEWS
by Internet Archive Web Group
collection

eye 3.4M

MAG-PDF-CRAWL-2021-08
MAG-PDF-CRAWL-2021-08
collection
189
ITEMS
762,930
VIEWS
collection

eye 762,930

MAG-PDF-CRAWL-2020-07
MAG-PDF-CRAWL-2020-07
collection
196
ITEMS
1.7M
VIEWS
by Internet Archive Web Group
collection

eye 1.7M

UNPAYWALL-PDF-CRAWL-2021-07
UNPAYWALL-PDF-CRAWL-2021-07
collection
174
ITEMS
1M
VIEWS
collection

eye 1M

UNPAYWALL-PDF-CRAWL-2020-11
UNPAYWALL-PDF-CRAWL-2020-11
collection
199
ITEMS
1.7M
VIEWS
by Internet Archive Web Group
collection

eye 1.7M

Wide Web Targeted PDF Crawling (2017)
Wide Web Targeted PDF Crawling (2017)
collection
922
ITEMS
3.1M
VIEWS
by Internet Archive Web Group
collection

eye 3.1M

UNPAYWALL-PDF-CRAWL-2020-05
UNPAYWALL-PDF-CRAWL-2020-05
collection
282
ITEMS
1.7M
VIEWS
by Internet Archive Web Group
collection

eye 1.7M

SEMSCHOLAR-DIRECT-PDF-CRAWL-2020-02
SEMSCHOLAR-DIRECT-PDF-CRAWL-2020-02
collection
1,011
ITEMS
1.4M
VIEWS
by Internet Archive Web Group
collection

eye 1.4M

OA-DOI-CRAWL-2020-12
OA-DOI-CRAWL-2020-12
collection
191
ITEMS
1.5M
VIEWS
by Internet Archive Web Group
collection

eye 1.5M

PLATFORM-CRAWL-2020
PLATFORM-CRAWL-2020
collection
649
ITEMS
460,547
VIEWS
by Internet Archive Web Group
collection

eye 460,547

OA-JOURNAL-CRAWL-2019-08
OA-JOURNAL-CRAWL-2019-08
collection
201
ITEMS
2.8M
VIEWS
by Internet Archive Web Group
collection

eye 2.8M

TARGETED-ARTICLE-CRAWL-2022-04
TARGETED-ARTICLE-CRAWL-2022-04
collection
219
ITEMS
268,235
VIEWS
collection

eye 268,235

collection

eye 1.9M

IA crawl of PDF urls provided by Semantic Scholar.
Topic: pdf
UNPAYWALL-PDF-CRAWL-2021-05
UNPAYWALL-PDF-CRAWL-2021-05
collection
123
ITEMS
906,382
VIEWS
by Internet Archive Web Group
collection

eye 906,382

CiteSeerX URL Crawl 2017
CiteSeerX URL Crawl 2017
collection
207
ITEMS
1.2M
VIEWS
collection

eye 1.2M

A targeted crawl to fetch research publications from the public web which have been crawled by CiteSeerX but have not previously been crawled by the Internet Archive.
Topics: scholarly, papers, journal
DOAJ-CRAWL-2020-11
DOAJ-CRAWL-2020-11
collection
102
ITEMS
959,382
VIEWS
by Internet Archive Web Group
collection

eye 959,382

OAI-PMH-PATCH-CRAWL-2021-12
OAI-PMH-PATCH-CRAWL-2021-12
collection
75
ITEMS
334,122
VIEWS
collection

eye 334,122

JOURNAL-HOMEPAGE-CRAWL-2022-03
JOURNAL-HOMEPAGE-CRAWL-2022-03
collection
44
ITEMS
294,784
VIEWS
collection

eye 294,784

DOI-CRAWL-2022-02
DOI-CRAWL-2022-02
collection
25
ITEMS
243,883
VIEWS
collection

eye 243,883

PubMed Central Crawl (2019-10)
PubMed Central Crawl (2019-10)
collection
216
ITEMS
431,481
VIEWS
by Internet Archive Web Group
collection

eye 431,481

PUBMEDCENTRAL-CRAWL-2020-02
PUBMEDCENTRAL-CRAWL-2020-02
collection
108
ITEMS
249,540
VIEWS
by Internet Archive Web Group
collection

eye 249,540

TARGETED-ARTICLE-CRAWL-2022-03
TARGETED-ARTICLE-CRAWL-2022-03
collection
9
ITEMS
51,601
VIEWS
collection

eye 51,601

SCIELO-CRAWL-2020-07
SCIELO-CRAWL-2020-07
collection
41
ITEMS
195,586
VIEWS
by Internet Archive Web Group
collection

eye 195,586

UNPAYWALL-PDF-CRAWL-2022-04
UNPAYWALL-PDF-CRAWL-2022-04
collection
38
ITEMS
17,825
VIEWS
collection

eye 17,825

arXiv Content Crawl (2019-10)
arXiv Content Crawl (2019-10)
collection
37
ITEMS
79,989
VIEWS
by Internet Archive Web Group
collection

eye 79,989

ARXIV-PUBMEDCENTRAL-CRAWL-2020-04
ARXIV-PUBMEDCENTRAL-CRAWL-2020-04
collection
60
ITEMS
114,355
VIEWS
by Internet Archive Web Group
collection

eye 114,355

Tianchi V700 KTV
Tianchi V700 KTV
collection
3,697
ITEMS
99,417
VIEWS
collection

eye 99,417

Music, Instrumentals and Wistful Backgrounds and Music to Sing Korean Hits To.
Topic: karaoke, North Korea
arXiv.org Bulk Content
arXiv.org Bulk Content
collection
6,767
ITEMS
172,437
VIEWS
by arxiv.org
collection

eye 172,437

This collection contains PDF and source file (LaTeX) copies of content from the arxiv.org pre-print server, in the bulk-access format they provide via AWS S3. More information available at:  https://arxiv.org/help/bulk_data_s3 Note that direct access to the internal PDF files is possible, eg: https://archive.org/download/arXiv_pdf_0001_001/arXiv_pdf_0001_001.tar/0001%2Fastro-ph0001001.pdf However, we strongly prefer folks access these files via the individual items associated with each...
CiteSeerX URL Crawl 2017
web

eye 12,545

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 12:27:34 PDT 2017 to Wed Jul 5 05:39:37 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 9,523

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 08:35:43 PDT 2017 to Thu Jul 6 01:46:47 PDT 2017.
Topic: crawldata
Tor Project Archives
Tor Project Archives
collection
4,580
ITEMS
29,024
VIEWS
by The Tor Project
collection

eye 29,024

Archived versions of Tor Browser and other Tor Project artifacts. This item is maintained by the Tor Project organization for historical interest and research use, not as a primary installation mechanism. Please visit  https://torproject.org/  to download and install Tor software.
CiteSeerX URL Crawl 2017
web

eye 9,096

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 12:17:48 PDT 2017 to Wed Jul 5 05:29:23 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 10,117

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 12:08:27 PDT 2017 to Wed Jul 5 05:22:22 PDT 2017.
Topic: crawldata
Open Science Framework
Open Science Framework
collection
95,324
ITEMS
104,853
VIEWS
by Center for Open Science
collection

eye 104,853

Top-level collection for content mirrored from Open Science Framework (OSF, https://osf.io) repositories into Internet Archive.
CiteSeerX URL Crawl 2017
web

eye 7,999

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 03:11:26 PDT 2017 to Wed Jul 5 20:24:18 PDT 2017.
Topic: crawldata
OSF Registrations
OSF Registrations
collection
95,455
ITEMS
104,559
VIEWS
by Center for Open Science
collection

eye 104,559

Top-level collection for archiving Open Science Framework (OSF) Registrations into Internet Archive. Part of a collaboration with Center for Open Science.
CiteSeerX URL Crawl 2017
web

eye 11,532

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 08:26:46 PDT 2017 to Thu Jul 6 01:37:50 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 10,387

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 08:53:13 PDT 2017 to Thu Jul 6 02:05:47 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 12,053

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 12:57:06 PDT 2017 to Wed Jul 5 06:10:16 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 11,951

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 13:06:40 PDT 2017 to Wed Jul 5 06:20:59 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 8,071

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 11:48:17 PDT 2017 to Wed Jul 5 05:01:28 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,743

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 12:36:41 PDT 2017 to Wed Jul 5 05:48:54 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 8,271

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 03:19:51 PDT 2017 to Wed Jul 5 20:32:06 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 10,944

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 08:45:15 PDT 2017 to Thu Jul 6 01:55:13 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 8,123

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 11:38:04 PDT 2017 to Wed Jul 5 04:54:20 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,871

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 08:08:25 PDT 2017 to Thu Jul 6 01:20:22 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 8,634

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 07:42:06 PDT 2017 to Thu Jul 6 00:54:28 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 9,512

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 09:01:08 PDT 2017 to Thu Jul 6 02:12:56 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,071

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 14:52:50 PDT 2017 to Thu Jul 6 08:15:06 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,885

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 07:17:51 PDT 2017 to Wed Jul 5 00:28:29 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,990

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 06:56:02 PDT 2017 to Thu Jul 6 00:08:40 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,733

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 06:07:30 PDT 2017 to Tue Jul 4 23:19:25 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,895

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 08:33:05 PDT 2017 to Wed Jul 5 01:46:32 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 10,851

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 01:25:08 PDT 2017 to Wed Jul 5 18:40:27 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,977

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 13:16:31 PDT 2017 to Wed Jul 5 06:29:05 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,569

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 11:06:40 PDT 2017 to Wed Jul 5 04:21:44 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 8,602

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 12:45:37 PDT 2017 to Wed Jul 5 05:59:07 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 9,258

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 08:18:25 PDT 2017 to Thu Jul 6 01:29:26 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,074

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 01:48:00 PDT 2017 to Wed Jul 5 19:01:22 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,260

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 03:28:50 PDT 2017 to Wed Jul 5 20:41:42 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,234

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 07:32:04 PDT 2017 to Thu Jul 6 00:46:06 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,593

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 07:51:37 PDT 2017 to Thu Jul 6 01:02:46 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,513

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 02:41:55 PDT 2017 to Wed Jul 5 19:54:59 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,498

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 07:26:10 PDT 2017 to Wed Jul 5 00:38:51 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,055

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 07:14:32 PDT 2017 to Thu Jul 6 00:28:30 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,544

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 11:59:15 PDT 2017 to Wed Jul 5 05:11:17 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,849

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 03:47:41 PDT 2017 to Wed Jul 5 20:59:49 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,443

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 06:35:45 PDT 2017 to Wed Jul 5 23:47:46 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,439

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 08:00:34 PDT 2017 to Thu Jul 6 01:11:41 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,367

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 07:24:37 PDT 2017 to Thu Jul 6 00:35:21 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,422

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 01:56:59 PDT 2017 to Wed Jul 5 19:14:04 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 5,808

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 05:56:50 PDT 2017 to Tue Jul 4 23:09:37 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 4,741

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 15:07:36 PDT 2017 to Thu Jul 6 08:28:04 PDT 2017.
Topic: crawldata
Dat Early Days Collection
Dat Early Days Collection
collection
4
ITEMS
6,584
VIEWS
collection

eye 6,584

'dat' is a distributed web data archiving and transfer tool, originally developed by Code for Science, a grant-funded US non-profit. This collection preserves a selection of early and experimental dat archives. Note that important dat metadata is contained in a '.dat/' subdirectory, which is not displayed under "download" file listings by defaults, but can be browsed and downloaded from archive.org over HTTP(S) as expected.
Topics: dat, distributed web
CiteSeerX URL Crawl 2017
web

eye 6,986

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 04:05:06 PDT 2017 to Wed Jul 5 21:19:08 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,748

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 07:44:33 PDT 2017 to Wed Jul 5 00:57:33 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,507

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 03:38:19 PDT 2017 to Wed Jul 5 20:50:10 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,769

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 10:48:22 PDT 2017 to Wed Jul 5 04:00:37 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,444

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 09:09:06 PDT 2017 to Thu Jul 6 02:22:13 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,733

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 11:27:04 PDT 2017 to Wed Jul 5 04:42:02 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 8,063

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 05:01:45 PDT 2017 to Tue Jul 4 22:50:03 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,373

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 07:07:49 PDT 2017 to Wed Jul 5 00:18:34 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,126

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc284.us.archive.org:CITESEERX-CRAWL-2017 from Wed Jul 5 08:03:45 PDT 2017 to Wed Jul 5 01:16:31 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,259

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 10:48:58 PDT 2017 to Thu Jul 6 04:03:15 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 6,305

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 09:29:23 PDT 2017 to Thu Jul 6 02:41:56 PDT 2017.
Topic: crawldata
CiteSeerX URL Crawl 2017
web

eye 7,057

favorite 0

comment 0

Internet Archive crawldata of uncrawled CiteseerX PDF URLs captured by wbgrp-svc285.us.archive.org:CITESEERX-CRAWL-2017 from Thu Jul 6 03:56:45 PDT 2017 to Wed Jul 5 21:07:29 PDT 2017.
Topic: crawldata