Home  >  Article  >  Several key technologies of big data architecture

Several key technologies of big data architecture

-
-Original
2018-03-10 09:41:205156browse

Rebuilding an enterprise IT infrastructure platform is a complex task. Replatforming is often triggered by a changing set of key business drivers, and that is exactly what is happening now. Simply put, the platforms that have dominated enterprise IT technology for nearly 30 years can no longer meet the demands of the workloads needed to drive business forward.

Several key technologies of big data architecture

#The core of digital transformation is data, which has become the most valuable thing in business. Organizations have long been plagued by the data they consume due to incompatible formats, limitations of traditional databases, and the inability to flexibly combine data from multiple sources. The emergence of emerging technologies promises to change all this.

Improving software deployment models is a major aspect of removing barriers to data usage. Greater “data agility” also requires more flexible databases and more scalable real-time streaming platforms. In fact, there are at least seven foundational technologies that can be combined to provide enterprises with a flexible, real-time "data fabric."

Unlike the technologies they are replacing, these seven software innovations are able to scale to meet the needs of many users and many use cases. For businesses, they have the ability to make faster, more informed decisions and create better customer experiences.

1. NoSQL database

RDBMS has dominated the database market for nearly 30 years. However, in the face of the continuous growth of data volume and the acceleration of data processing speed, traditional relational databases have shown their shortcomings. NoSQL databases are taking over due to their speed and ability to scale. In the case of document databases, they provide a simpler model from a software engineering perspective. This simpler development model speeds time to market and helps businesses respond more quickly to customer and internal user needs.

2. Live Streaming Platform

Responding to customers in real time is crucial to customer experience. It’s no mystery that consumer-facing industries have experienced massive disruption over the past decade. This has to do with a business's ability to react to users in real time. Moving to a real-time model requires event streaming.

Message-driven applications have been around for many years. However, today's streaming platforms are much larger and cheaper than ever. Recent advancements in streaming technology have opened the door to many new ways to optimize your business. By providing real-time feedback loops for software development and testing teams, event streaming can also help enterprises improve product quality and develop new software faster.

3. Docker and Containers

Containers have great benefits for developers and operators, as well as the organization itself. The traditional approach to infrastructure isolation is static partitioning, which assigns each workload a separate fixed block of resources (whether it's a physical server or a virtual machine). Static partitioning can make troubleshooting easier, but the cost of substantially underutilized hardware is high. For example, the average web server uses only 10% of the total available computing power.

The huge benefit of container technology is its ability to create a new way of isolation. Those who know containers best may believe they can get the same benefits by using tools like Ansible, Puppet, or Chef, but in fact these technologies are highly complementary. Additionally, no matter how hard enterprises try, these automation tools fail to achieve the isolation needed to freely move workloads between different infrastructure and hardware setups. The same container can run on bare metal hardware in an on-premises data center or on a virtual machine in the public cloud without any changes. This is true workload mobility.

4. Container Repositories

Container repositories are critical to agility. Without a DevOps process for building container images and a recycle bin to store them, each container would have to be built on every machine before it could run. A repository enables container images to be launched on the machine that reads the repository. This becomes more complex when processing across multiple data centers. If you build a container image in one data center, how do you move the image to another data center? Ideally, by leveraging a converged data platform, enterprises will have the ability to mirror repositories between data centers.

A key detail here is that mirroring capabilities between on-premises and cloud computing can differ significantly from mirroring capabilities between an enterprise's data centers. Converged data platforms will solve this problem for enterprises by providing these capabilities regardless of whether data center infrastructure or cloud computing infrastructure is used in the organization.

5. Container Orchestration

Each container appears to have its own private operating system, rather than a static hardware partition. Unlike virtual machines, containers do not require static partitioning of compute and memory. This enables administrators to launch large numbers of containers on a server without having to worry about large amounts of memory. With container orchestration tools like Kubernetes, it becomes very easy to launch containers, move them and restart them elsewhere in the environment.

After the new infrastructure components are in place, document databases like MapR-DB or MongoDB, event streaming platforms like MapR-ES or Apache Kafka (orchestration tools like Kubernetes), and in After implementing the DevOps process for building and deploying software in Docker containers, one must understand the question of which components should be deployed in these containers.

6. Microservices

Historically, the concept of microservices is not new. The difference today is that enabling technologies (NoSQL databases, event streaming, container orchestration) can scale with the creation of thousands of microservices. Without these new approaches to data storage, event streaming, and architectural orchestration, large-scale microservice deployments would not be possible. The infrastructure required to manage large volumes of data, events, and container instances will not scale to the required levels.

Microservices are all about providing agility. Microservices usually consist of a function or a small set of functions. The smaller and more focused the functional units of work are, the easier it is to create, test, and deploy services. These services must be decoupled or the enterprise will lose the promise of microservices with agility. Microservices can depend on other services, but typically through a load-balanced REST API or event streaming. By using event streaming, enterprises can easily track the history of events using request and response topics. This approach has significant troubleshooting benefits since the entire request flow and all data from the request can be replayed at any point in time.

Because microservices encapsulate a small piece of work, and because they are decoupled from each other, there are few barriers to replacing or upgrading services over time. In legacy mode, relying on tight coupling like RPC meant having to close all connections and then re-establish them. Load balancing is a big problem in implementing these because manual configuration makes them error-prone.

7. Functions as a Service

Just as we have seen microservices dominate the industry, so too will we see the rise of serverless computing or perhaps more accurately call it for Functions as a Service (FaaS). FaaS creates microservices in such a way that the code can be wrapped in a lightweight framework, built into a container, executed on demand (based on some kind of trigger), and then load balanced automatically, thanks to the lightweight framework . The beauty of FaaS is that it lets developers focus almost entirely on that functionality. Therefore, FaaS looks like the logical conclusion of the microservices approach.

Triggering events is a key component of FaaS. Without it, functions can be called and resources consumed only when work needs to be done. The automatic invocation of functions makes FaaS truly valuable. Imagine that every time someone reads a user's profile, there is an audit event, a feature that must run to notify the security team. More specifically, it may filter out only certain types of records. It can be optional, after all it is a fully customizable business feature. It is important to note that it is very simple to complete such a workflow using a deployment model like FaaS.

Putting Events Together

The magic behind triggering services is really just events in the event stream. Certain types of events are used as triggers more frequently than others, but any event that a business wishes to be a trigger can become a trigger. The triggering event can be a document update, running the OCR process on the new document, and then adding the text from the OCR process to the NoSQL database. If one thinks in a more interesting way, image recognition and scoring can be done through a machine learning framework whenever an image is uploaded. There are no fundamental limitations here. If a trigger event is defined, some event occurs, the event triggers the function, and the function completes its work.

FaaS will be the next stage in the adoption of microservices. However, there is one major factor that must be considered when approaching FaaS, and that is vendor lock-in. FaaS hides specific storage mechanisms, specific hardware infrastructure and orchestration, which are all great things for developers. But because of this abstraction, hosted FaaS offerings represent one of the greatest opportunities for vendor lock-in the IT industry has ever seen. Because these APIs are not standardized, migrating from a FaaS product in the public cloud is nearly impossible without losing nearly 100% of the work that has been done. If FaaS is approached in a more organized manner by leveraging events from converged data platforms, moving between cloud providers will become easier.

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn