Dell PowerEdge C5220 테스트 보고서 - 페이지 4

{카테고리_이름} Dell PowerEdge C5220에 대한 테스트 보고서을 온라인으로 검색하거나 PDF를 다운로드하세요. Dell PowerEdge C5220 13 페이지. Poweredge series
Dell PowerEdge C5220에 대해서도 마찬가지입니다: 기술 매뉴얼 (32 페이지), 포트폴리오 매뉴얼 (27 페이지)

Dell PowerEdge C5220 테스트 보고서
WHAT WE TESTED
Hadoop
MapReduce
Dell PowerEdge C5220: Hadoop MapReduce Performance
Reuse or repurpose servers easily when workloads change with hot-swap
server nodes – you no longer need to experience downtime by replacing the
entire server chassis.
Designed with power-efficiency and maintainability in, the Dell PowerEdge
C5220 maximizes operating efficiency with a shared-infrastructure design. To learn
more about the Dell PowerEdge C5220 and the entire Dell PowerEdge C Series, visit
http://www.dell.com/us/enterprise/p/poweredge-cloud-servers.
To test the ability of the PowerEdge C5220 microserver to handle large data
processing tasks, we used Hadoop, specifically Cloudera Distribution Including Apache
Hadoop (CDH). Below, we briefly discuss Hadoop and the benchmark tool we used,
MapReduce benchmark (mrbench).
Hadoop, developed by Apache Software Foundation, is an open-source
distributed application that enables the analysis of large volumes of data for specific
purposes. Using Hadoop's framework, IT organizations and researchers can build
applications that tailor the data analysis to specific needs for each company, even using
unstructured data. Many different markets—among them finance, IT, and retail—use
Hadoop due to its ability to handle heterogeneous data, both structured and
unstructured.
Hadoop can run across any number of machines using varied hardware,
spreading data across all available hardware resources using a distributed file system,
Hadoop Distributed File System (HDFS), and replicating data to minimize loss if a
hardware malfunction occurs. The software is able to detect hardware failures, and to
work around said failures to allow uninterrupted access to data. Because of its ability to
run on different hardware, a Hadoop cluster is scalable and flexible – it can be expanded
to encompass growing databases and companies. It is also cost-effective as it allows
companies to utilize commodity hardware effectively.
MapReduce is a framework within Hadoop that provides the ability to process
extremely large datasets in parallel across the Hadoop cluster, shortening the overall
processing time greatly. MapReduce breaks input data down into chunks to be
processed across the Hadoop cluster. When an application is run on a Hadoop cluster,
MapReduce perfoms "map" tasks that process data in parallel. The data is then sent to
"reduce" tasks that reduce the information into a final result. This allows for faster data
processing using multiple nodes, while still producing a single, comprehensive, accurate
result.
A Principled Technologies test report 4