DrugDev, a leading provider of technology solutions that help investigators, sponsors and CROs do more trials together.

As part of a novel industry data sharing collaboration, DrugDev approached us to develop a shared repository of clinical trial investigators and sites. This repository was to derive data from an industry consortium consisting of leading pharma companies such as J&J, Pfizer, Lilly, Novartis and Merck.
An important technical consideration in developing the repository was ensuring privacy of site/investigator data based on sharing permissions granted by investigators and companies.

The investigator and clinical trial site information provided by member companies for the repository had different levels of completeness. Our first challenge was to find a way to complete them. We developed methods to gather data from each member company and performed ETL (extract, text, and load). We used record linkage algorithms with natural language processing and semantic technologies at the site and investigator level. Then we built a search interface to allow member companies to query the combined site/investigator database. This became the core behind the DrugDev Golden Number – a universal identifier for persons and facilities to solve the issue of integration across data sources.
Data from each company was stored in a Virtual Private Database (VPD) and all privacy sharing rules were applied before data was visible to other members. The system ensured that three conditions were met before data can be shared:



investigators with research activity since 2008
protocols
sites
patients enrolled
Decreased administrative burden
Expanded access to clinical research opportunities
Member investigators got access to view, edit and comment on their profiles via website.
More improved protocol planning and informed country selection
Increased access to investigators for feasibility and site identification
Rapid recruitment and better matching of investigators to protocols
Faster site start-up by sharing standard documents and generic information
