The Force – Meet the Technical Side of ScienceOpen

ScienceOpen has been promoting Open Science from the beginning. For years we helped advancing this goal by supporting researchers and publishers to make science more visible, accessible, and reproducible. With this we aim to meet the global call(s) for openness and offer solutions that can benefit all.

The technical backbone

The ScienceOpen platform provides a unique advanced indexing, hosting, and publishing environment that is freely accessible and embedded within an interactive discovery and communication infrastructure of more than 60 million publication records—including journal articles, conference papers, open peer reviews, preprints—and offers free poster and preprint publication (incl. versioning) for researchers.

Last year, we launched within our framework the UCL Open publishing platform for the UCL Open: Environment multi-disciplinary journal. In close collaboration with our partner UCL Press, an alternative space for new modes of scientific content community curation was created.

The platform received a new infrastructural branch to include books and book chapters, an essential advancement that offers an additional channel for our researchers, customers, and users to promote and discover relevant content and to expand their portfolio or profile.

And we just amended our standardized metadata sets to include Data Availability Statements for our published preprints and posters, same as this is now an optional feature on all articles in the database that registered authors can manage, enhance, and share as part of the freely available (meta)data of their own publications.

Our architecture therefore not only reflects state-of-the-art technological publishing solutions in adherence with industry best practices and standards but is also built on open source modules embedded such in the framework that it ensures scalability for future developments. With a current status of 61m publication records, 26m author records, and 25k journals, we want to use this opportunity to introduce the technical backbone of ScienceOpen. The core of our platform—our policies and infrastructure developments—is founded on principles of openness and transparency, data persistency and portability, privacy and security. Thus, interoperable taxonomies were central to its structural make-up from the beginning:

Cloud-based platform

ScienceOpen technological infrastructure is hosted on an internal private cloud using the VMware vSphere platform. The cloud infrastructure uses a cluster of servers that are each running vSphere Hypervisors and are centrally managed by a vCenter Server. This kind of cloud infrastructure provides greater scalability and resiliency for the ScienceOpen platform.  

The system also relies on a cluster of servers for the private cloud and a set of a NAS devices for hosting terabytes of data. The NAS devices are configured with multiple levels of redundancy and data mirroring for consistency and scalability. The NAS is used for storage of file data, search index, and databases. Platform hosting is provided by Ovitas, Inc., USA.

Open Source Technologies

ScienceOpen is a multi-tiered application that is organized into functional components. Each of the different tiers utilizes different Open Source technologies to create a robust and scalable architecture:

Overview of functional components and their Open Source technologies
Open and standardized data and metadata formats for simple data transfer

Our XML data and metadata are created based on current versions of the NCBI standardized JATS and BITS DTDs. ScienceOpen is a member of Crossref and Metadata2020, both working to provide support and guidance on best-practice rich metadata generation. Wherever available, we work with standardized identifiers such as Crossref and Datacite DOIs, ORCID or PubMed IDs, or FundRef for persistent datasets and easy exchange between platforms. This is also facilitated by direct user data export in standard reference formats for BibTeX, Reference Manager (RIS), and Endnote.

Data portability between systems and user authentication  

The website contains essential metadata of publications in meta-tag which are compatible with Google Scholar and Google. Google and Google Scholar, Baidu and Baidu Academic are currently harvesting data from the ScienceOpen environment. We have further interfaces with Crossref, ORCID, PubMed, the Directory of Open Access Journals (DOAJ), Datacite, Zenodo, and others. All our published preprints and articles are registered with Crossref DOIs, including rich metadata deposits and machine-readable Creative Commons licenses. Open Access full-text XML data and/or pdfs may be downloaded directly from the platform.

ScienceOpen user identification is based on ORCID. Registered user profile details are drawn directly from publicly available ORCID data. Data added to ScienceOpen may be copied directly to the user’s ORCID profile to ensure maximum portability and reduce redundancy.

Time to meet the force that drives the platform – the ScienceOpen developer and technical lead:

The architecture behind the ScienceOpen platform is built upon Ovitas’ design and development, such as in use for the global portal platforms by two of the big four accounting conglomerates and the authoritative source of the US GAAP. The latest two software generations were developed by Ovitas Hungary. Ovitas provides outstanding, internationally acknowledged products and services in the field of content lifecycle management (CMS and workflow solutions, including bridge processes for dynamic multi-channel publishing and information/communication portals). As a global market leader within the sector, the company offers cutting edge customizable standards-based solutions for enterprises handling vast amounts of dynamically changing, long-lasting information. Given the many years of experience and excellence in working with databases and taxonomy software, Ovitas is the perfect match as ScienceOpen’s development partner.

Having had the pleasure of working closely with this group of young and dedicated developers for the past three years, team meetings especially are a treat to look forward to. A/Inspiring ideas easily come to fruition when we put our heads together sharing offices and expertise to find the most suited realizations of future road maps for the platform. While looking forward to picking their brains some more in the future, I am honored to introduce to you the head developers of the ScienceOpen Budapest team: Our Technical Lead Péter, our Senior Web Developer Bálint, and Tihamér, our Senior Data Conversion Analyst.


Headline image source: U.S. Department of Defense, Military Review, October 1972, Vol. 52, No. 10, Page 97., public domain.
Graphics source: ScienceOpen

1 thought on “The Force – Meet the Technical Side of ScienceOpen”

  1. Your blog The Force – Meet the Technical Side of ScienceOpen brilliantly delves into the intricate mechanics powering ScienceOpen. It’s an illuminating glimpse behind the scenes of this impressive platform!

Leave a Reply

Your email address will not be published. Required fields are marked *