Dell PowerEdge C5220 Test Report - Page 4

Browse online or download pdf Test Report for Server Dell PowerEdge C5220. Dell PowerEdge C5220 13 pages. Poweredge series
Also for Dell PowerEdge C5220: Technical Manual (32 pages), Portfolio Manual (27 pages)

Print Page

WHAT WE TESTED

Hadoop

MapReduce

Dell PowerEdge C5220: Hadoop MapReduce Performance



Reuse or repurpose servers easily when workloads change with hot-swap

server nodes – you no longer need to experience downtime by replacing the

entire server chassis.

Designed with power-efficiency and maintainability in, the Dell PowerEdge

C5220 maximizes operating efficiency with a shared-infrastructure design. To learn

more about the Dell PowerEdge C5220 and the entire Dell PowerEdge C Series, visit

http://www.dell.com/us/enterprise/p/poweredge-cloud-servers.

To test the ability of the PowerEdge C5220 microserver to handle large data

processing tasks, we used Hadoop, specifically Cloudera Distribution Including Apache

Hadoop (CDH). Below, we briefly discuss Hadoop and the benchmark tool we used,

MapReduce benchmark (mrbench).

Hadoop, developed by Apache Software Foundation, is an open-source

distributed application that enables the analysis of large volumes of data for specific

purposes. Using Hadoop's framework, IT organizations and researchers can build

applications that tailor the data analysis to specific needs for each company, even using

unstructured data. Many different markets—among them finance, IT, and retail—use

Hadoop due to its ability to handle heterogeneous data, both structured and

unstructured.

Hadoop can run across any number of machines using varied hardware,

spreading data across all available hardware resources using a distributed file system,

Hadoop Distributed File System (HDFS), and replicating data to minimize loss if a

hardware malfunction occurs. The software is able to detect hardware failures, and to

work around said failures to allow uninterrupted access to data. Because of its ability to

run on different hardware, a Hadoop cluster is scalable and flexible – it can be expanded

to encompass growing databases and companies. It is also cost-effective as it allows

companies to utilize commodity hardware effectively.

MapReduce is a framework within Hadoop that provides the ability to process

extremely large datasets in parallel across the Hadoop cluster, shortening the overall

processing time greatly. MapReduce breaks input data down into chunks to be

processed across the Hadoop cluster. When an application is run on a Hadoop cluster,

MapReduce perfoms "map" tasks that process data in parallel. The data is then sent to

"reduce" tasks that reduce the information into a final result. This allows for faster data

processing using multiple nodes, while still producing a single, comprehensive, accurate

result.

A Principled Technologies test report 4