Dell PowerEdge C5220 테스트 보고서 - 페이지 4

{카테고리_이름} Dell PowerEdge C5220에 대한 테스트 보고서을 온라인으로 검색하거나 PDF를 다운로드하세요. Dell PowerEdge C5220 13 페이지. Poweredge series
Dell PowerEdge C5220에 대해서도 마찬가지입니다: 기술 매뉴얼 (32 페이지), 포트폴리오 매뉴얼 (27 페이지)

페이지 인쇄

WHAT WE TESTED

Hadoop

MapReduce

Dell PowerEdge C5220: Hadoop MapReduce Performance



Reuse or repurpose servers easily when workloads change with hot-swap

server nodes – you no longer need to experience downtime by replacing the

entire server chassis.

Designed with power-efficiency and maintainability in, the Dell PowerEdge

C5220 maximizes operating efficiency with a shared-infrastructure design. To learn

more about the Dell PowerEdge C5220 and the entire Dell PowerEdge C Series, visit

http://www.dell.com/us/enterprise/p/poweredge-cloud-servers.

To test the ability of the PowerEdge C5220 microserver to handle large data

processing tasks, we used Hadoop, specifically Cloudera Distribution Including Apache

Hadoop (CDH). Below, we briefly discuss Hadoop and the benchmark tool we used,

MapReduce benchmark (mrbench).

Hadoop, developed by Apache Software Foundation, is an open-source

distributed application that enables the analysis of large volumes of data for specific

purposes. Using Hadoop's framework, IT organizations and researchers can build

applications that tailor the data analysis to specific needs for each company, even using

unstructured data. Many different markets—among them finance, IT, and retail—use

Hadoop due to its ability to handle heterogeneous data, both structured and

unstructured.

Hadoop can run across any number of machines using varied hardware,

spreading data across all available hardware resources using a distributed file system,

Hadoop Distributed File System (HDFS), and replicating data to minimize loss if a

hardware malfunction occurs. The software is able to detect hardware failures, and to

work around said failures to allow uninterrupted access to data. Because of its ability to

run on different hardware, a Hadoop cluster is scalable and flexible – it can be expanded

to encompass growing databases and companies. It is also cost-effective as it allows

companies to utilize commodity hardware effectively.

MapReduce is a framework within Hadoop that provides the ability to process

extremely large datasets in parallel across the Hadoop cluster, shortening the overall

processing time greatly. MapReduce breaks input data down into chunks to be

processed across the Hadoop cluster. When an application is run on a Hadoop cluster,

MapReduce perfoms "map" tasks that process data in parallel. The data is then sent to

"reduce" tasks that reduce the information into a final result. This allows for faster data

processing using multiple nodes, while still producing a single, comprehensive, accurate

result.

A Principled Technologies test report 4