If an enterprise needs high-performance computing to process its big data, it may work best to operate it on-premises. Here's what businesses need to know, including how high-performance computing and Hadoop differ.
In the field of big data, not every company needs high-performance computing (HPC), but almost all companies using big data have adopted Hadoop-style analytical computing.
The difference between HPC and Hadoop is difficult to distinguish because Hadoop analytics jobs can be run on high-performance computing (HPC) devices, but not vice versa. Both HPC and Hadoop analytics use parallel data processing, but in Hadoop and analytics environments, data is stored on hardware and distributed across multiple nodes of that hardware. In high-performance computing (HPC), data file sizes are much larger and data is stored centrally. High-performance computing (HPC) requires high throughput and low latency due to its large file sizes and the need for more expensive network communications such as InfiniBand.
The purpose for enterprise CIOs is clear: If an enterprise can avoid HPC and use Hadoop only for analytics, it can do so. This approach is cheaper, easier for employees to operate, and can even run in the cloud where other companies (such as third-party vendors) can run it.
Unfortunately, for all enterprises and institutions in life sciences, meteorology, pharmaceuticals, mining, medical, government, and academia that require high-performance computing (HPC) processing, it is impossible to adopt Hadoop. Due to the large size of the files and the extremely strict processing requirements, using a data center or cloud computing is not a good solution.
In short, high performance computing (HPC) is a perfect example of a big data platform running inside the data center. Because of this, it becomes a challenge for companies to ensure that the hardware they invest heavily in does the job it needs to do.
Big Data Hadoop and HPC platform provider PSCC Labs chief strategy officer Alex Lesser said: "This is a challenge faced by many companies that must use HPC to process their big data. Most of these companies have the support of traditional IT infrastructure, they naturally take this approach and build the Hadoop analytical computing environment themselves because this uses commodity hardware that they are already familiar with, but for high-performance computing (HPC), the response is usually to let the vendor Process.”
Companies considering adopting high-performance computing (HPC) need to take the following four steps:
1. Ensure senior-level support for high-performance computing (HPC)
The senior managers and board members of the enterprise do not necessarily need to be experts in the field of high-performance computing, but they must not be without their understanding and support. These managers should all have sufficient understanding of high-performance computing (HPC) and can clearly support the large-scale hardware, software and training investments that may be made for the enterprise. This means they must be educated on two aspects: (1) What HPC is and why it is different from ordinary analysis and requires special hardware and software. (2) Why companies need to use HPC instead of legacy analytics to achieve their business goals. Both of these education efforts should be the responsibility of the chief information officer (CIO) or chief development officer (CDO).
Lesser said: "The companies that are most aggressive in adopting HPC are the ones that believe they are real technology companies, pointing to the Amazon Web Services cloud service, which started as a retail business for Amazon.com and has become a huge profit. Center.”
2. Consider a pre-configured hardware platform that can be customized
Companies such as PSSC Labs offer pre-packaged and pre-configured HPC hardware. "We have a base package based on HPC best practices and work with customers to customize that base package based on the customer's computing needs," Lesser said, noting that almost every data center must have some customization.
3. Understand the return
As with any IT investment, HPC must be cost-effective and the business should be able to achieve a return on investment (ROI), which is already in the minds of management and the board of directors clarify. "A good example is aircraft design," Lesser said. “High-performance computing (HPC) is a huge investment, but it’s quickly paid back when a company discovers it can use HPC to simulate designs and get five nines of accuracy and no longer has to rent a physical wind tunnel. Invest. ”
4. Train your own IT staff
HPC computing is not an easy transition for your IT staff, but if you want to run an on-premises operation, you should let the team Positioned for self-sufficiency.
Initially, businesses may need to hire outside consultants to get started. But the goal of a consulting assignment should always be twofold: (1) keep the HPC application running, and (2) transfer knowledge to employees so they can take over operations. Businesses should not be satisfied with this.
At the heart of the HPC team is the need for a data scientist who can develop the highly complex algorithms required for high-performance computing to answer the enterprise's questions. It also requires a programmer with strong C+ or Fortran skills and the ability to work on powerful systems in a parallel processing environment, or an expert in network communications.
"The bottom line is that if an enterprise is running jobs once or twice every two weeks, it should go to the cloud to host its HPC," Lesser said. "But if an enterprise is using HPC resources and running jobs, such as a pharmaceutical company Or a biology company may run it multiple times a day, then running it in the cloud would be a waste of money and should consider running their own in-house operations.”