Using NHLBI BioData Catalyst for Workshops and Courses

Overview

There are many challenges associated with teaching genomics research concepts to students in a hands-on workshop or course format. Course instructors need mechanisms to share course materials with the students, including test data, methods, and example analyses.

Course instructors also need to provide students with up-to-date software packages for carrying out the hands-on exercises. Finally, course instructors benefit from the ability to monitor student progress and review final work. All of this becomes even more challenging when courses and workshops are held virtually.

Likewise, students need access to the course materials (including the files and code) as well as a compute resource with sufficient computational power. Students may require technical help from course instructors as they work through the exercises.

BioData Catalyst provides a solution for the challenges described above, offering course instructors and students the necessary technical infrastructure for teaching and learning. This lets the instructors and students focus on the science instead of the technical infrastructure.

BioData Catalyst Success Stories

Students mentioned in the stories below learned from the coursework and kept their NHLBI BioData Catalyst accounts following the completion of the course. This allows students to easily continue working on future research projects in BioData Catalyst.

UW Summer Institutes in Statistical Genetics

The University of Washington (UW) holds an annual summer program for students to learn statistical genetics concepts. As part of this program, UW instructors Ken Rice, Tim Thornton, Stephanie Gogarten, and Matt Conomos lead a three-day course to teach the methodology behind association testing. This hands-on course was historically held on-site at UW, and participants had to set up an RStudio environment on their own computers in order to work through the lessons.

The course instructors began using NHLBI BioData Catalyst for the hands-on lessons in 2019. The registered participants were instructed to create BioData Catalyst platform accounts in advance of the workshop and learn the basics for using the platform.
The course instructors set up a public project called “GENESIS Tutorial” with all of the files and R code needed for the lessons. This public project is available to all users on the platform.

The course has been offered successfully to almost 250 students in the last three years with an average cloud cost spend of ~$10 per student. The cloud costs were covered by NHLBI each year.

The Seven Bridges team provided support to participants prior to and during the workshop, assisting with account setup and troubleshooting. This enabled the course instructors to focus on the content and teaching while the Seven Bridges team focused on the infrastructure and platform account logistics.

Genomics course for American Thoracic Society Annual Meeting

In June 2021, NHLBI BioData Catalyst was used to teach a hands-on introductory genomics analysis course as part of the American Thoracic Society (ATS) annual meeting. The course instructors needed to limit distribution of the course materials to only the course participants, since they had paid a registration fee. To limit distribution, the course instructors set up a private platform project in BioData Catalyst with all of the files and code needed for the lessons. Once the course materials were finalized, the Seven Bridges Support Team created copies of the project for each individual student, giving them their own sandbox to work in. There were approximately 50 students for this course.

The course instructors made recorded lectures available on the ATS meeting website from May 14 - July 2, 2021. These recorded lectures showed the students how to carry out hands-on exercises on BioData Catalyst. Prior to May 14, the Seven Bridges team instructed all of the registered students to create platform accounts. The Seven Bridges team also distributed tutorials on the platform foundational feature set and how to open and use the ATS Course Materials project.

Procedure

Setting up course materials on the platform

Course and workshop organizers can share data and code with participants by using platform projects. The organizers can choose to make the course materials available to all users on the platform or limit distribution to the registered participants.

Distributing materials to specific participants

In cases where there is a registration fee for a workshop or course, organizers may want to limit distribution of the course materials to only those individuals who registered for the course. In this case, organizers should set up a Course Materials project with all the needed data and methods.

For small courses of less than 20 participants, the participants can be added as members of the Course Materials project. All course participants can work together in the same project and view each other’s analyses.

It becomes difficult to organize the work of more than 20 people in one project. For that reason, we recommend giving each participant their own copy of the Course Materials project when there are more than 20 participants in a course. Instructors and support staff can be added as members of the project as needed for assistance.

Sharing materials with all platform users

In cases where the workshop or course materials could benefit researchers beyond those who registered for the event, the materials can be made available to BioData Catalyst users through a “public project.” Public projects are listed in the top navigation bar of the platform. Organizers set up a private project with all of the needed data and methods for the event. Once the project is finalized, the Seven Bridges team creates a “public project” using the content provided. Public projects can only contain open access data that can be shared with all platform users.

Files

Course instructors should provide students with test files for the course. These test files should be added to the Files tab of the project. Course instructors can upload/import files to the platform, following BioData Catalyst data protection policies and these steps.

Open access hosted datasets can also be used. For example, BioData Catalyst hosts files from the 1000 Genomes dataset and makes these available to all users on the platform. More information about the hosted datasets can be found on the Data page of the BioData Catalyst website.

Analysis methods

Course instructors can make methods available within projects using either interactive notebooks in the Data Studio feature and/or through Common Workflow Language tools and workflows. If course instructors want to expose all of the code in a set of methods, they can set up RStudio, Jupyterlab, or SAS notebooks.

To access the code, the participant would launch the notebook. If course instructors want to show participants how to scale up analyses (batch processing) and ensure reproducibility, they can include CWL versions of the methods in the project as well. For more information on CWL, see this blog post on working with CWL.

Determine how cloud costs will be supported

Participants will incur cloud costs while using BioData Catalyst. In order to create projects and run analyses on the platform, participants must be a member of a platform billing group. A billing group provides a mechanism for the participant cloud costs to be captured and paid for.

We recommend that course organizers use one platform billing group to support the cloud costs of all event participants. The Seven Bridges team creates the billing group in advance of the workshop and ensures that all participants are set up as “members” of the billing group.

Course or workshop organizers have two options for supporting the costs of the billing group:

  • Apply for NHLBI Cloud Credits
  • Set up a payment mechanism with Seven Bridges

Apply for NHLBI Cloud Credits

NHLBI offers a cloud credits program for heart, lung, blood, and sleep researchers. Individual researchers can request $500 in pilot cloud credits. In addition, researchers can request project-based cloud credits. We recommend applying for project-based cloud credits to support the cost of multiple participants in a workshop or course.

Steps:

  1. Estimate the cloud costs for the event
    a) Set up data and code on BioData Catalyst.
    b) Run through the exercises to determine the expected cloud costs for individual participants
    c) Estimate the total costs by scaling up to the expected number of participants
    d) Refer to tutorial on Estimating and Managing Cloud Costs
    e) Reach out [email protected] if you need assistance
  2. Submit application for cloud credits
    a) Go to the BioData Catalyst website page on Cloud Costs and Credits
    b) Submit an application for “Additional cloud credits”
  3. Seven Bridges will be notified if your application is approved. If approved, Seven Bridges will create a platform billing group with the approved cloud credit amount. You will be notified about the status of your application as well as once your billing group is available.

Set up a payment mechanism with Seven Bridges

Researchers can pay for their own cloud costs on BioData Catalyst by providing a purchase order number or a credit card. Course instructors can work with the Seven Bridges team to set up a billing group. Course instructors have the choice to be invoiced after the workshop completes or to pre-pay.

Steps:

  1. Estimate the cloud costs for the event
    a) Set up data and code on BioData Catalyst
    b) Run through the exercises to determine the expected cloud costs for individual participants
    c) Estimate the total costs by scaling up to the expected number of participants
    d) Refer to tutorial on Estimating and Managing Cloud Costs
    e) Reach out [email protected] if you need assistance
  2. Email [email protected] and indicate that you would like to host a workshop or course on the platform and need a platform billing group. The support team will inform the BioData Catalyst Program Manager who will reach out to you with next steps.

Platform accounts for participants

Course organizers manage registration for the course. The Seven Bridges team manages the process of course participants getting set up with accounts on BioData Catalyst.

2 weeks before the event
Course organizers send the Seven Bridges Team the final list of participants and their email addresses.

Seven Bridges Team emails the participants instructions on how to create accounts. Participants are asked to send their platform usernames to the Seven Bridges Team. This step confirms that the individual has completed account creation and enables the Seven Bridges Team to add the individual to the platform billing group that will be used to support cloud costs for the event.

1 week before the event

For participants who have not yet created platform accounts:

  • Seven Bridges Team follows up with a reminder email.

For participants who have created platform accounts:

  • Seven Bridges Team adds them to the platform billing group that will be used to support the cloud costs for the event.
  • Seven Bridges Team distributes an introductory tutorial and asks participants to work through the instructions prior to the event.

Day of the event

A Seven Bridges Community Engagement Manager is available to ensure the students are successful. The Seven Bridges Support Team is also available to troubleshoot any issues that may come up.

For more information

If you are interested in using NHLBI BioData Catalyst for a workshop or course, please write to us at [email protected]. We are eager to help you get the most out of the platform.