AuthorsJohn C. Hoag
, Ohio UniversityAbstract
Designs for large scale computing in commercial and even scientific settings, including virtual and cloud models, do not serve sequencing tasks well. This research is developed from the perspective of computation, networking, and storage, and it posits that a NGS execution requirements drive infrastructure in a different direction. Or, the current environments limit inquiry to smaller models that may lack insight and capability – such cases are optimized by in-memory compute, which does not scale.
This research commences with an analysis and taxonomy of components and systems - noting how they are misaligned with pipelines, thus contributing both to latency and underutilization. The scope of this task includes caching, parallel processing, GPU usage, storage-area networking; as well as hypervisor types and their interaction with commodity cloud storage. NGS tools in this analysis include Burrows- Wheeler Aligner and Novalign.
The goal of this research is to develop an approach and infrastructure around the Galaxy Project with the intention to acquiring and operating assets to support research and instruction – sensitive to workloads and the need to scale. The project methodology is to discern functional and performance requirements, assess materiel on-hand (based on a donated compute cluster), specify remaining components, and prepare for operations. Engagement with the Galaxy community is essential in this endeavor.