2. The solution

[This post is the second in a series that serialises the One Repo whitepaper in digestible chunks. Do please weigh in with comments! See also Part 1: the problem]

We offer The One Repo (http://onerepo.net) as a solution to these challenges. This is a system, already existing in proof-of-concept form, to gather all the content of all the world’s repositories into a single database, in a uniform format, freely accessible to all as a Web UI, as embeddable widgets, as a set of web services, and as harvestable data.

The One Repo is not a research project, but is built on battle-tested components that are in use in high-volume commercial systems. It has been proven robust, efficient and scalable.

Content policy

The policy of The One Repo is to accept all objects deposited in the included repositories, including:

Actual manuscripts, with full text available.
Metadata records describing manuscripts that are not available. These are important for at least three reasons. First, in some cases, they describe manuscripts that will become freely available after the expiry of an embargo period; second, such metadata records provide a means of discovering the author and requesting a copy directly – a process that may be facilitated by an “ask author for a copy” button; and third, records of manuscripts that should be available (but are not) are important data for tracking compliance of open-access policies.
Associated data-sets, such as specimen photos, matrices for phylogenetic analysis, databases of observations and survey results.

Data objects deposited with third-party services such as GenBank, FigShare or Morphbank are out of scope.

Methods of harvesting and searching

The One Repo works by a seamless integration of searching remote systems and locally harvested data. While harvested data is quicker to access and enables more efficient and accurate facets and sorting, it is also more expensive to set up and an initial harvest can take some time to complete, so direct searching provides a useful alternative in difficult cases. Different approaches are appropriate for different databases.

Harvesting works by any of these methods:

Metadata transfer using the OAI-PMH protocol
Bulk download of records in any XML format
Bulk download of records in any MARC-based format
Any XML-based harvesting API
Any web-based UI can be crawled when no better solution is available

Similarly, real-time searching works by means of several methods:

The ANSI/NISO Z39.50 protocol
The SRU family of web-service searching protocols
The Solr protocol
Any XML-based searching API
Any web-based UI can be screen-scraped

All databases are treated essentially equally within the One Repo, and a uniform web service API is provided by which any of them can be searched.

5 responses to “2. The solution”

Pingback: 1. The Problem | The One Repo blog
David Wojick | 23 May, 2015 at 1:31 pm | Reply

How do you propose to ensure perpetual operation? Endowment?

You may have a PR problem in the USA, where repo is short for repossession, usually of a car by the bank’s repo man.

LikeLike
- Mike Taylor | 24 May, 2015 at 4:24 pm | Reply
  
  At this point, David, we considering all sorts of different options for funding.
  
  Interesting point on the word “repo”.
  
  LikeLike
Calvin Sadowski | 9 July, 2015 at 4:22 pm | Reply

I like the concept, but what would the core record structure be of your database, dublin core? when I scan the different repositories, what of the key things I notice is that they all have vastly different structure of a “record”.

LikeLike
- Mike Taylor | 9 July, 2015 at 5:32 pm | Reply
  
  Calvin, your question is one of the key ones! We’ve convened a small, informal working group to discuss just this. There is plenty of prior art: for example RCUK’s RIOXX Profile includes some of the necessary additional fields. But we may need to merge this and other existing profiles in order to have the expressiveness to capture all relevant fields. We’ll blog about this as we make progress.
  
  LikeLike

2. The solution

Content policy

Methods of harvesting and searching

5 responses to “2. The solution”

Leave a comment Cancel reply

Copyright and licence

Recent Posts

Archives

Feeds

Links

2. The solution

Content policy

Methods of harvesting and searching

Share this:

5 responses to “2. The solution”

Leave a comment Cancel reply

Copyright and licence

Recent Posts

Archives

Categories

Feeds

Links