Hadoop

Easily develop apps that are capable of processing vast amounts of data
Download

Hadoop Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Apache
  • Price:
  • FREE
  • Publisher Name:
  • Apache Software Foundation
  • Publisher web site:
  • http://www.apache.org/
  • Operating Systems:
  • Mac OS X
  • File Size:
  • 29.3 MB

Hadoop Tags


Hadoop Description

Easily develop apps that are capable of processing vast amounts of data Hadoop is a software platform that lets one easily write and run applications that process vast amounts of data.Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS). MapReduce will divide your applications into many small blocks of work.HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. MapReduce can then process the data where it is located.Hadoop has been demonstrated on clusters with 2000 nodes. The current design target is 10,000 node clusters. Here are some key features of "Hadoop": · Scalable: Hadoop can reliably store and process petabytes. · Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes. · Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid. · Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures. What's New in This Release: · Sub-task · - Remove commons dependency on commons-cli2 · Bug · - API link in forrest doc should point to the same version of hadoop. · - hadoop fs -help should list detailed help info for the following commands: test, text, tail, stat & touchz · - Document JobInitializationPoller configuration in capacity scheduler forrest documentation. · - Document TaskTracker's memory management functionality and CapacityScheduler's memory based scheduling. · - Reduce Task Progress shows > 100% when the total size of map outputs (for a single reducer) is high · - BZip2CompressionOutputStream NullPointerException · - When the size required for a path is -1, LocalDirAllocator.getLocalPathForWrite fails with a DiskCheckerException when the disk it selects is bad. · - Recovery duration shown on the jobtracker webpage is inaccurate · - o.a.h.mapred.Merger not maintaining map out compression on intermediate files · - Job is left in Running state after a killJob · - Possible NPE in CapacityScheduler's MemoryMatcher · - TestQueueCapacities is failing Hudson tests for the last few builds · - Not able to generate gridmix.jar on already compiled version of hadoop · - TestReplicationPolicy. fails on java.net.BindException · - TestMRServerPorts fails on java.net.BindException · - HftpFileSystem.getChecksum(..) does not work for the paths with scheme and authority · - org.apache.hadoop.mapreduce.Reducer should not be abstract. · - Change Namenode file close log to info · - Capacity Scheduler should not check for presence of default queue while starting up. · - Jobs failed during job initalization are never removed from Capacity Schedulers waiting list · - Update CapacityScheduler documentation to reflect latest changes · - Errors encountered in MROutputThread after the last map/reduce call can go undetected · - DFS Write pipeline does not detect defective datanode correctly in some cases (HADOOP-3339) · - Use absolute path for JobTracker's mapred.local.dir in MiniMRCluster · - map/reduce doesn't run jobs with 0 maps · - mapred metrics shows negative count of waiting maps and reduces · - TestQueueCapacitisues.apache.org/jjira/browse/HADOOP-OP-6017] - NameNode and SecondaryNameNode fail to restart because of abnormal filenames. · - Multiple bugs w/ Hadoop archives · - Incomplete help message is displayed for rm and rmr options. · - hadoop 0.20 branch "test-patch" is broken · - No error message for deleting non-existant file or directory. · - fix GenericOptionParser to deal with -D with '=' in the value · Improvement · - Remove pre-emption from the capacity scheduler code base · New Feature · - New binary file format · - Metric to show number of fs.exists (or number of getFileInfo) calls · - Handling of Trash with quota


Hadoop Related Software