Are Your Data Cleaning Cycle Times Out of Control?

Are Your Data Cleaning Cycle Times Out of Control?

The quality and ultimate reliability of clinical data collected from investigative sites during a study depends on many factors. One area that may not get enough attention is the cycle time from subject visit through eCRF data capture, and final review and cleaning of that data. Site personnel’s transcription of source data generated during subject visits into the sponsor-provided eCRF system can cause long delays. These delays allow loss of memory to potentially impact proper interpretation of information originally entered into the patient chart. Once the eCRF data has been captured, further delays in review and querying of that data by sponsor personnel (e.g., data managers and site monitors) will only exacerbate this issue. Over the many years I spent in clinical operations, I became an increasingly strong advocate of driving towards optimal cycle time efficiency in the capture and cleaning of clinical data – not only to improve data quality but to support reduced time to database locks, reporting and analyses.

Our Medidata Insights metrics warehouse includes a number of metrics focused on these data capture and cleaning cycle times. The included graph shows the industry trend for two of these cycle times in particular:

  • Subject visit through final closure of queries related to that visit (i.e., overall data capture through cleaning cycle time)
  • eCRF entry through generation of data manager queries


The graph presents the industry median values for Phase II and III studies from 2009 through 2012. The overall data capture and cleaning cycle time efficiency does show a positive downward trend over this period – from 36 days in 2009 to 33 days in 2012. It actually dipped below 30 days in 2011 before rising again. This certainly supports the notion that organizations have been putting increased focus on these processes in recent years, and in particular we’re noticing a positive downward trend in the time from subject visit to eCRF entry by sites.

The trend in time from eCRF entry to generation of data manager queries is very interesting since it correlates closely with the overall capture and cleaning cycle time trend. In particular, it appears to be trending downward in the years prior to 2012 and then back up again in 2012. While not yet conclusive, the 2012 “bounce” in this metric is a very likely driver of the similar bounce upward in the overall cycle time metric. What continues to amaze me is the actual length of that cycle time—right-hand axis on the graph—where the annual benchmark for the industry ranges from 59 to 89 days! This amount of lag time in generation of queries to the site from data management personnel cannot be looked at as having a positive influence on data quality.

This reminds me of an ongoing—and sometimes heated—debate we had in my previous organizations regarding the proper cadence of sponsor team data reviews. There were essentially two distinct schools of thought – one that insisted data managers wait until site monitors have completed their source document verification (SDV) review of the eCRF data before conducting their reviews, and the other that insisted data managers should conduct their reviews as soon as the data was available in the eCRF system regardless of the status of site monitor review. It is apparent from the current industry benchmark that most organizations are still tied to the “monitors first” approach.

For the record, I was and continue to be passionately in the latter camp (i.e., parallel data reviews), for all of the reasons discussed above. Am I right? We would love to hear your thoughts on this very important topic.

More about Stephen Young

About Author

Medidata Solutions

Medidata Solutions