Skip to main content

Bulk Bibliographic Metadata

Internet Archive Web Group

This collection contains both external ("upstream") metadata dumps and Internet Archive generated databases and reports on our holdings of papers, books, and other documents.



rss RSS

215
RESULTS


Show sorted alphabetically

Show sorted alphabetically

SHOW DETAILS
up-solid down-solid
eye
Title
Date Archived
Creator
Bulk Bibliographic Metadata
by ISSN
data

eye 313

favorite 1

comment 0

Unlike most ISSN metadata, this mapping file is publicly available.
Bulk Bibliographic Metadata
by Microsoft Academic
data

eye 925

favorite 1

comment 0

This is an updated snapshot of the Microsoft Academic Graph corpus. Microsoft generously makes this corpus available at no cost under the ODC-BY "open data license" ( https://opendatacommons.org/licenses/by/1.0/ ). See the link for details; at a minimum this license requires downstream users to acknowledge the creator. You can read more about the corpus, including how to obtain updated copies on Microsoft Azure, a schema reference, etc, at the following URLs and in the following...
by Internet Archive Web Group
collection

eye 1,358

This collection holds database snapshots (SQL) and bulk metadata exports (JSON and TSV) from https:///fatcat.wiki (an Internet Archive service)
Bulk Bibliographic Metadata
by Allen Institute for Artificial Intelligence
data

eye 54

favorite 0

comment 0

This is a backup of the "Open Academic Search" corpus, published by Semantic Scholar / Allen Institute for AI. For more info see http://labs.semanticscholar.org/corpus/. In particular, note the terms and conditions: Semantic Scholar Open Research Corpus is licensed under  ODC-BY . When using the Semantic Scholar Open Research Corpus (“S2 ORC”) in a product or service, or including data in a redistribution, please cite the following paper: Waleed Ammar et al. 2018. Construction...
Bulk Bibliographic Metadata
by aiminer.org
data

eye 622

favorite 0

comment 0

A copy of the "Open Academic Graph v2" (OAGv2) corpus published by aminer.org and Microsoft Academic Graph in early 2019. Contains roughly 90 GB (compressed) of bibliographic metadata for hundreds of millions of publications. Related publications include: Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. ArnetMiner: Extraction and Mining of Academic Social Networks. In Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data...
Bulk Bibliographic Metadata
by Crossref
data

eye 483

favorite 1

comment 0

This file is a snapshot dump of the Crossref DOI metadata API, containing entries for over 99 million DOIs. This was generated by running the scripts at: https://github.com/greenelab/crossref (git commit: 768a49ba1d8ba1971f00471950514716a9f699c8) The script completed on 2018-09-20. Format is xz-compressed JSON (one JSON object per line).
Bulk Bibliographic Metadata
by Microsoft Academic
data

eye 974

favorite 0

comment 0

This is an updated snapshot of the Microsoft Academic Graph corpus. Microsoft generously makes this corpus available at no cost under the ODC-BY "open data license" ( https://opendatacommons.org/licenses/by/1.0/ ). See the link for details; at a minimum this license requires downstream users to acknowledge the creator. You can read more about the corpus, including how to obtain updated copies on Microsoft Azure, a schema reference, etc, at the following URLs and in the following...
Bulk Bibliographic Metadata
by Jan Szczepanski
data

eye 21

favorite 0

comment 0

Downloaded from: https://www.ebsco.com/sites/g/files/nabnos191/files/acquiadam-assets/Jan-Szczepanski-Open-Access-Journals-2018_0.docx
Bulk Bibliographic Metadata
by CiteSeerX Group at PSU
data

eye 184

favorite 0

comment 0

This is a mirror of a CiteSeerX database dump, downloaded from S3. It's hosted here for easy Internet Archive analytics access, and so we don't need to re-pay S3 download fees. See also: http://csxstatic.ist.psu.edu/about/data
Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 39

favorite 0

comment 0

This item contains a complete PostgreSQL SQL database snapshot from https://fatcat.wiki, in binary 'pg_dump tar mode' format. With the exception of the 'abstracts' table (for which no aggregate license or copyright claims can be made; downstream users are responsible for their use), all metadata here is licensed CC-0 (public domain release) and may be used for any purpose. Downstream users are strongly encouraged to provide attribution and link here to the snapshot, as well as give credit to...
Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 83

favorite 0

comment 0

This is a mapping between: - DOIs (Crossref) - PubMed PMID and PMCID (NIH) - CORE record identifier (core.ac.uk) - Wikidata QIDs See README and scripts for details.
Bulk Bibliographic Metadata
by Allen Institute for Artificial Intelligence
data

eye 178

favorite 0

comment 0

Semantic Scholar Open Research Corpus is licensed under  ODC-BY . When using the Semantic Scholar Open Research Corpus (“S2 ORC”) in a product or service, or including data in a redistribution, please cite the following paper: Waleed Ammar et al. 2018. Construction of the Literature Graph in Semantic Scholar. NAACL https://www.semanticscholar.org/paper/09e3cf5704bcb16e6657f6ceed70e93373a54618 This site is provided by The Allen Institute for Artificial Intelligence (“AI2”) as a service...
Bulk Bibliographic Metadata
by Crossref
data

eye 644

favorite 2

comment 0

This file is a snapshot dump of the Crossref DOI metadata API, containing entries for over 94 million DOIs. Compared to the previous 2017-03 version (see archive.org item "crossref_doi_dump_201703"), this snapshot has a few million more works, but the corpus size is much larger (29 GB compressed vs. 7 GB compressed) as it now contains significantly more citation data, due to the efforts of the Initiative for Open Citations (I4OC) project. This was generated by running the scripts...
Bulk Bibliographic Metadata
by Microsoft Academic
data

eye 58

favorite 0

comment 0

This is an updated snapshot of the Microsoft Academic Graph corpus. Microsoft generously makes this corpus available at no cost under the ODC-BY "open data license" ( https://opendatacommons.org/licenses/by/1.0/ ). See the link for details; at a minimum this license requires downstream users to acknowledge the creator. You can read more about the corpus, including how to obtain updated copies on Microsoft Azure, a schema reference, etc, at the following URLs and in the following...
Bulk Bibliographic Metadata
by Microsoft Academic Search
data

eye 319

favorite 0

comment 0

This is a copy of the Microsoft Academic Graph corpus of scholarly publications and citations, based on crawls from the open web. Metadata (authors, DOI numbers, journals, citations, keywords, affiliations, etc) is included for more than 125 million publications. The corpus is a single 27GB zipfile that extracts into about 96GB of flat tab-separated text files, cross-referenced using identifier columns. Schema information can be found in the `readme.txt` file, and usage restrictions can be...
Bulk Bibliographic Metadata
data

eye 336

favorite 0

comment 0

A snapshot of the oaDOI DOI/URL database, including open access status for each paper. oaDOI is the API backing unpaywall; see oadoi.org for more details. This dataset is intended for NON-COMMERCIAL USE ONLY; contact oaDOI for details or commercial support.
Bulk Bibliographic Metadata
by Microsoft Academic
data

eye 127

favorite 1

comment 0

This is an updated snapshot of the Microsoft Academic Graph corpus. Microsoft generously makes this corpus available at no cost under the ODC-BY "open data license" ( https://opendatacommons.org/licenses/by/1.0/ ). See the link for details; at a minimum this license requires downstream users to acknowledge the creator. You can read more about the corpus, including how to obtain updated copies on Microsoft Azure, a schema reference, etc, at the following URLs and in the following...
Bulk Bibliographic Metadata
by Microsoft Academic
data

eye 135

favorite 0

comment 0

This is a mirror of the RDF dump posted at:  http://ma-graph.org/rdf-dumps/ The license provided with this metadata is: Open Data Commons Attribution License (ODC-By) v1.0
Bulk Bibliographic Metadata
by ORCID, Inc.
data

eye 101

favorite 0

comment 0

This item contains an annual copy of the ORCID public data file, as originally downloaded from:  https://orcid.figshare.com/articles/dataset/ORCID_Public_Data_File_2020/13066970 More details about this content and it's use available at: https://orcid.org/content/orcid-public-data-file This dataset is available under the public domain (CC-0).
Bulk Bibliographic Metadata
by ORCID, Inc
data

eye 165

favorite 0

comment 0

This item contains an annual copy of the ORCID public data file, as originally downloaded from: https://orcid.org/content/download-file More details about this content and it's use available at: https://orcid.org/content/orcid-public-data-file This dataset is available under the public domain (CC-0). The DOI of this dataset is: https://doi.org/10.6084/m9.figshare.5479792
Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 7

favorite 0

comment 0

This item contains hash lists of PDF files crawled from the public web specifically to preserve the scholarly record. It does not contain hashes of *all* PDFs the archive has ever seen, only a subset. Not all of these hashes are necessarily journal articles or other research outputs, but we have reason to believe the large majority are.
Bulk Bibliographic Metadata
data

eye 21

favorite 0

comment 0

Bulk Bibliographic Metadata
data

eye 127

favorite 0

comment 0

A snapshot of data available through datacite.org API (v2) . 18210075 items, 72GB uncompressed, sha1 of uncompressed file (datacite.ndjson): 6fa3bbb1fe07b42e021be32126617b7924f119fb. Dump created with dcdump utility. For official data dumps, you might want to follow https://github.com/datacite/datacite/issues/709 . To get a random sample of the data (e.g. 100000 records), you can combine shuf, sort and filterline . $ filterline datacite.sample.ndjson
Bulk Bibliographic Metadata
by ROAD: Directory of Open Access Scholarly Resources
data

eye 87

favorite 0

comment 0

This is a backup of ROAD/ISSN metadata, downloaded July 3rd, 2017 from http://road.issn.org/en/contenu/download-road-records Dumps in both MARC XML and RDF format are included. These files are under the Creative Commons Attribution-NonCommercial 4.0 International Public License (aka, CC-BY-NC).
Topic: metadata
Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 32

favorite 0

comment 0

Bulk Bibliographic Metadata
data

eye 6

favorite 0

comment 0

Mirrored from:  https://isaw.nyu.edu/publications/awol-index/ Note creator request: The content of the  The AWOL Index  is derived from: Charles E. Jones,  AWOL - The Ancient World Online  (ISSN 2156-2253), 2009-. That content is re-used and re-mixed here under the terms of  AWOL's  Creative Commons Attribution Share-Alike 3.0 Unported license. The production and publication of  The AWOL Index  contributes significant additional value both to the content itself and to its presentation...
Bulk Bibliographic Metadata
by Allen Institute for Artificial Intelligence
data

eye 24

favorite 0

comment 0

This is a backup of the "Open Academic Search" corpus, published by Semantic Scholar / Allen Institute for AI. For more info see http://labs.semanticscholar.org/corpus/. In particular, note the terms and conditions, and the request: We request that any published research that makes use of this data cites the following paper: Waleed Ammar et al. 2018. Construction of the Literature Graph in Semantic Scholar. NAACL. ...
Bulk Bibliographic Metadata
data

eye 18

favorite 1

comment 0

Bulk Bibliographic Metadata
by creator
data

eye 20

favorite 0

comment 0

Bulk Bibliographic Metadata
by Impactstory
data

eye 46

favorite 0

comment 0

A mirror of the Unpaywall (aka oaDOI.org) metadata corpus, primarily consisting of public open access flags for a large number of Crossref-registered DOIs (identifiers representing published journal articles and other works). For more information see: http://unpaywall.org/products/snapshot
Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 76

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 27

favorite 1

comment 0

This item contains a complete PostgreSQL SQL database snapshot from https://fatcat.wiki, in binary 'pg_dump tar mode' format. With the exception of the 'abstracts' table (for which no aggregate license or copyright claims can be made; downstream users are responsible for their use), all metadata here is licensed CC-0 (public domain release) and may be used for any purpose. Downstream users are strongly encouraged to provide attribution and link here to the snapshot, as well as give credit to...
Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 28

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 15

favorite 0

comment 0

Bulk Bibliographic Metadata
by DIrectory of Open Access Journals
data

eye 48

favorite 0

comment 0

Downloaded from https://doaj.org/csv and the OAI-PMH interface. File names encode the date when data was downloaded.
Bulk Bibliographic Metadata
data

eye 8

favorite 0

comment 0

This item contains a set of "Keeper's Reports" summarizing journal content preservation coverage from major archival services and networks (Portico, LOCKSS, CLOCKSS). See README for links to where these files were downloaded from.
Bulk Bibliographic Metadata
by moreo.info
data

eye 7

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
data

eye 39

favorite 0

comment 0

Bulk Bibliographic Metadata
data

eye 10

favorite 0

comment 0

Bulk Bibliographic Metadata
data

eye 6

favorite 0

comment 0

Bulk Bibliographic Metadata
by dblp
data

eye 25

favorite 0

comment 0

Bulk Bibliographic Metadata
by JURN
data

eye 12

favorite 0

comment 0

JURN is a scholarly web search engine implemented as a custom Google search index. A subset of resources are included in a directory at:  http://www.jurn.org/directory/ This item contains snapshots of the directory in the form of TSV files. At least to start these are only title + URL, but we hope to reconcile or lookup to ISSN number.
Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 54

favorite 0

comment 0

This item contains an example corpus of citations between scholarly documents, as extracted from the fatcat (https://fatcat.wiki) corpus as of the 2020-08-05 bulk release export. This corpus itself was generated from a fatcat-scholar "intermediate" fulltext dump which is not public, using software in the fatcat-scholar repository in mid-September 2020. See also the README for some more notes, and the "sample" file.
Bulk Bibliographic Metadata
by Impactstory
data

eye 113

favorite 0

comment 0

A mirror of the Unpaywall (aka oaDOI.org) metadata corpus, primarily consisting of public open access flags for a large number of Crossref-registered DOIs (identifiers representing published journal articles and other works). For more information see: http://unpaywall.org/products/snapshot
Bulk Bibliographic Metadata
data

eye 18

favorite 0

comment 0

This item contains work-level metadata about papers on academia.edu, obtained through their OAI-PMH interface.
Bulk Bibliographic Metadata
data

eye 10

favorite 0

comment 0

Bulk Bibliographic Metadata
data

eye 40

favorite 0

comment 0

Bulk Bibliographic Metadata
by DBN / ZDB / EZB
data

eye 32

favorite 0

comment 0

# Journal Homepage Candidates * starting point: 67519 ISSN, w/o homepage * find homepages via ZDB JOP and Google SERP Files: > zdb_fize_homepage_available.ndj: https://archive.org/download/issn_homepage_candidates_20200530/zdb_fize_homepage_available.ndj High-probability journal homepage via JOP EZB (Germany journal database). 12555 links. ```json {   "issn": "1425-6959",   "homepage": "http://wydawnictwo.pttz.org/" } ``` > issn_serp_minimal.ndj:...
Topic: issn
Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 9

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 3

favorite 0

comment 0

Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 19

favorite 1

comment 0

Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 14

favorite 0

comment 0

This is a derivative of https://archive.org/download/ia_papers_manifest_2018-01-25, which contains JSON objects that can be inserted into a fatcat catalog.
Bulk Bibliographic Metadata
data

eye 7

favorite 0

comment 0

Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 28

favorite 0

comment 0

This dump includes all tables (including oauth authentication tables which could be a privacy, but not security, concern). At this time only IA staff have accounts, so the snapshot, which is intended mostly for disaster recovery, is still public.
Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 83

favorite 0

comment 0

Bulk Bibliographic Metadata
by dblp
data

eye 19

favorite 0

comment 0

Bulk Bibliographic Metadata
data

eye 6

favorite 0

comment 0

Bulk Bibliographic Metadata
data

eye 178

favorite 0

comment 0

Manifest of Internet Archive's identified scholarly works in digital form (eg, journal articles). See README.html for details.
Bulk Bibliographic Metadata
by Japan Link Center
data

eye 29

favorite 0

comment 0

Downloaded from http://japanlinkcenter.org/top/material/material_metadata.html
Fatcat Database Snapshots and Bulk Metadata Exports
by Internet Archive Web Group
data

eye 13

favorite 0

comment 0

See README.md
Bulk Bibliographic Metadata
by ROAD: Directory of Open Access Scholarly Resources
data

eye 138

favorite 0

comment 0

This is a backup of ROAD/ISSN metadata from http://road.issn.org/en/contenu/download-road-records Dumps in both MARC XML and RDF format are included; see sub-directory for date of download. See also earlier July 2017 dump at: https://archive.org/download/road-issn-2017 These files are under the Creative Commons Attribution-NonCommercial 4.0 International Public License (aka, CC-BY-NC).
Topic: metadata
Bulk Bibliographic Metadata
data

eye 13

favorite 0

comment 0

Researchgate.net sitemap as jsonlines. 26453174 docs. Example doc: {   "title": "Detection of mRNA using a digoxigenin end labelled oligodeoxynucleotide probe",   "lastmod": "2020-07-30",   "url": "https://www.researchgate.net/publication/20786402_Detection_of_mRNA_using_a_digoxigenin_end_labelled_oligodeoxynucleotide_probe" }
Bulk Bibliographic Metadata
data

eye 7

favorite 0

comment 0

Mirrored from:  https://github.com/njahn82/vanished_journals/tree/master/data
Bulk Bibliographic Metadata
data

eye 14

favorite 0

comment 0

This is the 2020 "baseline" PubMed/MEDLINE bibliographic metadata corpus, originally published in December 2019. Downloaded from https://www.nlm.nih.gov/databases/download/pubmed_medline.html
Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 7

favorite 0

comment 0

About 1 million unique PDFs from Global Wayback before year 2000.
Bulk Bibliographic Metadata
by ORCID, Inc
data

eye 25

favorite 0

comment 0

This item contains an annual copy of the ORCID public data file, as originally downloaded from: https://orcid.org/content/download-file More details about this content and it's use available at: https://orcid.org/content/orcid-public-data-file This dataset is available under the public domain (CC-0). The DOI of this dataset is: https://doi.org/10.14454/07243.2013.001
Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 6

favorite 0

comment 0

Bulk Bibliographic Metadata
by Internet Archive Web Group
data

eye 8

favorite 0

comment 0

Snapshot of Internet Archive (petabox) file-level metadata (eg, PDF hashes) for files under the 'journals' collection as of December 2018. Note: includes a small number of items not actually under the 'journals' collection hierarchy due to how the input item list was generated, and a small fraction (estimate 500?) of items didn't dump successfully. A bit sloppy!
Bulk Bibliographic Metadata
by EZB
data

eye 13

favorite 0

comment 0

See README for details. Scraped from: http://ezb.uni-regensburg.de/ezeit/services/collections.phtml?bibid=AAAAA&colors=1〈=en http://ezb.uni-regensburg.de/ezeit/services/xmloutput.phtml?bibid=AAAAA&colors=1〈=de#6.2
Bulk Bibliographic Metadata
by Impactstory
data

eye 35

favorite 0

comment 0

A mirror of the Unpaywall (aka oaDOI.org) metadata corpus, primarily consisting of public open access flags for a large number of Crossref-registered DOIs (identifiers representing published journal articles and other works). For more information see: http://unpaywall.org/products/snapshot
Bulk Bibliographic Metadata
data

eye 23

favorite 0

comment 0

OAI-PMH metadata collected from the arxiv.org endpoint, using the arXivRaw schema. Collected in two batches: up through ~2017, then up through May 22nd, 2019.
Bulk Bibliographic Metadata
data

eye 21

favorite 0

comment 0

Bulk Bibliographic Metadata
data

eye 17

favorite 0

comment 0