Web of Science Data

This is a full extract of data from UW-Madison’s Web of Science subscription. It is available for research and data science projects. Listed below, you can find the terms and conditions governing UW-Madison’s copy of the data set.

Related: The Web of Science database is the searchable web-based interface to the same information.

Gaining Access to the Data Set

Use of the data set requires that individuals contact the library to request access.

Please contact the UW-Madison Libraries’ Library Technology Group and indicate your interest in the Web of Science data. Access will be granted after contacting our staff and confirming you are a current UW-Madison employee or student and that you have read and agree to the terms and conditions below.

Data Set Details

Publisher
Clarivate
Access Policy
Only current UW-Madison faculty, staff, students, and researchers in the United States can use the data.
Permitted Use
  • Commercial use of the data set or derived data is strictly prohibited
  • Distribution of the data set or derivative data sets created from the original is prohibited
  • The intellectual property rights for the data or derivative data sets are owned by Clarivate and may not be shared
  • After affiliation with UW-Madison ends, individuals may no longer use the data set
Access Mode
Downloadable Files
Formats
  • XML
  • JSON
Number of Records
Approximately 60 million
Size of Data Set
Approximately 300 GB per complete copy in compressed form. Data is divided into multiple files with a maximum of 100,000 records per file and grouped by year.

Coverage

This data set includes citations in the UW-Madison Libraries’ Web of Science subscription. It includes information published from 1900 up to the most recent calendar year. The Libraries get a new complete data extract each spring through the previous calendar year.

Support Tools

The UW-Madison Libraries have also developed some code samples to assist researchers with getting started using the data.

Web of Science Explorer

A simple Python utility to find and read article records in the Web of Science data. These scripts use a JSON format, where each record is on its own line, allowing the data to be processed efficiently without using much memory.

CHTC Recipes

This project includes examples that show how to use the Web of Science data in UW-Madison’s Center for High Throughput Computing (CHTC) environment.

The following excerpt comes from a contract between the BTAA and Clarivate.

Individuals requiring access to the data set after leaving UW-Madison should contact the libraries for assistance. Clarivate is willing to discuss special terms for providing access on a case-by-case basis to researchers needing continued access.

Rights and Restrictions on Client’s Use of Data Set

(1) Data is to be used only for academic research and data science projects
(2) Data is restricted to the use of faculty, staff, students and researchers at Big Ten Academic Alliance member institutions located in the United State as identified above in the Project Scope. For researchers based outside of the United States, Clarivate requires the right to review access for membership prior to granting access to the data.
(3) Commercial use of the data set or derived data is strictly prohibited
(4) Data License does not include sharing outside Big Ten Academic Alliance institutions (ex. other institutions, government agencies, or corporate entities).
(5) Access to the data archive is dependent on the Institutions Web of Science back file depth for all institutions. Universities who do not have a current license for the full Web of Science archive must request access for any missing file depth(s) to Clarivate for individual research projects.
(6) Access is contingent on Participating Member maintaining existing Web of Science subscription.
(7) Big Ten Academic Alliance shall make Individual Participating Member Institutions aware of the Rights and Restrictions outlined in the Statement of Work.

Supplemental Terms and Conditions Applicable to the Data Set

With respect to any license of a Custom Dataset, Client may use such Custom Dataset to perform numerical or statistical analyses of data elements derived from a Content Service. In addition, notwithstanding any language to the contrary contained herein, Client may (i) download the Custom Dataset for use in data analytics, and proprietary or third party tools; (ii) use web crawlers to extract patterns from the Custom Dataset; and (iii) create derivative databases consisting of the above-mentioned analytics; provided, however, that all Intellectual Property Rights to such Custom Dataset or derivative databases shall be owned by [Clarivate, formerly Thomson Reuters]; all such rights granted in this clause are limited to Client’s internal, non-commercial use of the Custom Dataset, and Client may not distribute or sublicense to any third party any portion of the Custom Dataset or derivative databases created under this clause. Use of the Custom Dataset may also be limited to a specific project if so designated on the Cover Sheet.

Access Files Online

You must request access for this data set.

See Gaining Access to the Data Set on this page.