Remote Computing

Getting things you need

Screen Recording

Reading

  • Textbook: Working with Remote Machines.p 57-64

  • Lab Handbook - Mox Guide (including all subpages under Mox Guide)

Objectives

Gain confidence and skill set to access remote computers from the command-line.

Remote Machine Access

The ability to remotely access computing resources is highly valuable for graduate courses, as it enables students and researchers to access powerful computational facilities and collaborate effectively from anywhere. This remote access enhances learning experiences, promotes research progress, and optimizes resource utilization.

Key benefits include:

  1. Access to powerful computing resources: Remote access allows students and researchers to work with high-performance computing (HPC) clusters, specialized software, and large-scale data storage without needing to invest in expensive personal hardware.

  2. Flexible learning and research: Remote access provides students the freedom to work from any location with internet connectivity, enabling them to fit their studies around personal commitments and schedules.

  3. Collaboration and resource sharing: Remote access facilitates collaboration between researchers from different institutions, allowing them to share data, code, and other resources seamlessly.

  4. Cost-effectiveness: By centralizing computational resources, institutions can reduce costs associated with maintaining multiple systems and software licenses.

  5. Enhanced security: Remote computing resources often have robust security measures in place, ensuring that sensitive data and intellectual property are protected.


Example of SSH to log into a remote machine:

Secure Shell (SSH) is a widely used protocol for secure remote access to computers. To log into a remote machine using SSH, follow these steps:

  1. Open a terminal or command prompt on your local machine.

  2. Type the following command, replacing ‘username’ with your remote account username and ‘remote_host’ with the remote machine’s address (e.g., IP address or domain name): ssh username@remote_host

  3. Press Enter. You may be prompted to accept the remote host’s key fingerprint if this is your first time connecting.

  4. Enter your password when prompted. After successful authentication, you will be connected to the remote machine and can begin executing commands.

Hyak

The Hyak computing system at the University of Washington (UW) is a high-performance computing (HPC) cluster designed to support the research and computational needs of UW students, faculty, and staff. Hyak provides advanced computing capabilities to facilitate complex scientific and engineering simulations, data analysis, and other computationally-intensive tasks.

Key features of the Hyak computing system include:

  1. Scalability: Hyak is designed to be a highly scalable system, accommodating a wide range of workloads, from small-scale computations to large-scale parallel processing jobs.

  2. Parallel processing capabilities: The system is equipped with multiple interconnected compute nodes, each with multiple CPU cores and GPUs, enabling efficient parallel computing for tasks that can be distributed across multiple processors.

  3. High-speed storage: Hyak offers high-speed storage options, including parallel file systems and high-performance storage tiers, to support data-intensive workloads.

  4. High-speed interconnects: The system employs low-latency, high-bandwidth interconnects between compute nodes to ensure efficient data transfer and communication during parallel processing tasks.

  5. User-friendly environment: Hyak offers a familiar Linux-based environment, supporting various programming languages, libraries, and tools commonly used in scientific research.

  6. Resource allocation and management: Hyak uses a job scheduler (e.g., Slurm) to manage resource allocation and user access, ensuring fair usage and optimal utilization of computing resources.

Bash and remote computing

When working with remote computing, several Bash commands can be particularly useful for managing files, navigating directories, and monitoring system resources. Here are five of the most useful Bash commands:

  1. ssh: The ssh command (Secure Shell) is crucial for establishing secure remote connections to other computers. It allows you to log in and execute commands on remote machines. Usage: ssh username@remote_host

  2. scp: The scp command (Secure Copy) enables you to transfer files securely between your local machine and a remote machine, or between two remote machines. Usage: scp source destination, where source and destination can be either local or remote paths, such as username@remote_host:/path/to/file.

  3. rsync: The rsync command is a versatile tool for synchronizing files and directories between local and remote machines. It optimizes data transfer by only transferring the differences between source and destination files. Usage: rsync options source destination, where source and destination can be either local or remote paths.

  4. top or htop: The top command provides a real-time, dynamic view of the processes running on a system, including remote machines. It displays information about CPU usage, memory usage, and other system metrics. The htop command is an enhanced version of top, providing a more user-friendly interface with additional features. To use it on a remote machine, simply run ssh username@remote_host top or ssh username@remote_host htop.

These commands, among others, are essential for efficiently working with remote computing resources and managing your workflow when connected to remote machines.

Job Scheduler

A job scheduler, also known as a workload manager or batch scheduler, is a software tool that manages and allocates computational resources within a high-performance computing (HPC) cluster or multi-user computing environment. It is responsible for accepting, prioritizing, and assigning computational jobs submitted by users to the available resources, ensuring efficient utilization and fair distribution of resources among users.

Slurm (Simple Linux Utility for Resource Management) is an open-source, highly-scalable, and widely-used job scheduler designed for Linux-based HPC clusters. It facilitates the management of computational workloads by handling job submission, scheduling, dispatching, and monitoring tasks. Slurm is fault-tolerant, supporting the efficient execution of jobs across thousands of compute nodes.

Miscellany

Reciprocal BLAST

A reciprocal BLAST (Basic Local Alignment Search Tool) analysis is a technique used to identify homologous genes or proteins between two species by performing a BLAST search in both directions. In other words, it involves searching for similarities between a query sequence from species A against a database of sequences from species B, and then searching for similarities between a query sequence from species B against a database of sequences from species A. This approach helps confirm orthology, which is useful in comparative genomics and evolutionary studies.

Three insightful visualizations of reciprocal BLAST analysis results could be:

  1. Circos Plot: A Circos plot is a circular layout that can display relationships between genomic sequences. In the context of reciprocal BLAST analysis, a Circos plot can effectively illustrate orthologous relationships between two species by connecting homologous genes or proteins with lines or curves. The strength of the connection (e.g., based on similarity scores or e-values) can be represented by the width or color of the lines, providing a comprehensive overview of the conservation between the two genomes.

  2. Heatmap: A heatmap is a graphical representation of data where values are represented by colors. In a reciprocal BLAST analysis, a heatmap can be used to visualize the similarity scores or e-values between pairs of orthologous genes or proteins. The rows and columns of the heatmap would represent genes or proteins from species A and species B, respectively, with the color intensity representing the strength of the orthologous relationship. This allows for easy identification of strongly conserved regions or potential functional similarities between the two species.

  3. Scatter Plot with Dot Matrix: A scatter plot with dot matrix can visualize the distribution of reciprocal best hits (RBH) based on similarity scores or e-values. The x-axis would represent the scores or e-values for species A against species B, and the y-axis would represent the scores or e-values for species B against species A. Each dot on the plot would represent a pair of orthologs, with its position determined by the score or e-value in both directions. This visualization allows users to identify potential outliers, observe the overall correlation between the two species, and assess the quality of the reciprocal BLAST analysis results.

Saving my PAT

install.packages("gitcreds")
gitcreds::gitcreds_set()

Chat GPT in Rstudio

Video of recent lab meeting