Practical Hadoop Security

Practical Hadoop Security

Bhushan Lakhe

Language: English

Pages: 220

ISBN: 1430265442

Format: PDF / Kindle (mobi) / ePub


Practical Hadoop Security is an excellent resource for administrators planning a production Hadoop deployment who want to secure their Hadoop clusters. A detailed guide to the security options and configuration within Hadoop itself, author Bhushan Lakhe takes you through a comprehensive study of how to implement defined security within a Hadoop cluster in a hands-on way.

You will start with a detailed overview of all the security options available for Hadoop, including popular extensions like Kerberos and OpenSSH, and then delve into a hands-on implementation of user security (with illustrated code samples) with both in-the-box features and with security extensions implemented by leading vendors.

No security system is complete without a monitoring and tracing facility, so Practical Hadoop Security next steps you through audit logging and monitoring technologies for Hadoop, as well as ready to use implementation and configuration examples--again with illustrated code samples.

The book concludes with the most important aspect of Hadoop security – encryption. Both types of encryptions, for data in transit and data at rest, are discussed at length with leading open source projects that integrate directly with Hadoop at no licensing cost. 

Practical Hadoop Security:

  • Explains importance of security, auditing and encryption within a Hadoop installation
  • Describes how the leading players have incorporated these features within their Hadoop distributions and provided extensions
  • Demonstrates how to set up and use these features to your benefit and make your Hadoop installation secure without impacting performance or ease of use

BackTrack 5 Cookbook

Network and System Security

Linux Firewalls: Attack Detection and Response with iptables, psad, and fwsnort

Security Consulting (4th Edition)

Multimedia Encryption and Watermarking (Multimedia Systems and Applications)

Hacking and Securing iOS Applications: Stealing Data, Hijacking Software, and How to Prevent It

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

For use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. ISBN-13 (pbk): 978-1-4302-6544-3 ISBN-13 (electronic): 978-1-4302-6545-0 Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial.

Your processes (both manual and automated) and classify your data well, you can determine who needs access to which specific resources and what level of access is required. That’s the definition of fine-grained authorization: every user has the correct level of access to necessary resources. Fine-tuning Hadoop security to allow access driven by functional need will make your Hadoop cluster less vulnerable to hackers and unauthorized access—without sacrificing usability. In this chapter, you will.

Processes, and a particular type of data may exist in multiple places. How do you connect and analyze data from disjoint sources to get the total view of system operations, history, and state? The key is Hadoop’s audit logs. This section will discuss which daemon processes generate which data, what kind of data is captured by auditing, and how you can use Hadoop audit logs for security proposes. To get a complete system picture, you need to understand what kind of data is logged by Hadoop.

Directory ($HADOOP_INSTALL is the directory where Hadoop is installed). The audit log file names are defined in the log4j.properties file, and the defaults are hdfs-audit.log (for the HDFS audit log) and mapred-audit.log (for the MapReduce audit log). You can’t define audit logging for YARN using log4j.properties yet; this is still being worked on (see “Add YARN Audit Logging to log4j.properties,” https://issues.apache.org/jira/browse/HADOOP-8392). To enable auditing, you need to modify the.

Conclusions: The partition for 2/20/14 was overwritten (ugi=RogueITGuy ip=/127.0.0.1 cmd=source:/127.0.0.1 append_partition: db=default tbl=ticket_details[2014,2,20]) for table Ticket_details by RogueITGuy. The file Ticket_details_20140220 was uploaded on 3/6/14 22:26 and the Hive partition was overwritten on 3/6/14 22:42 by the same user—RogueITGuy. Case closed! Last, investigators checked the jobs submitted by RogueITGuy. Several job-related logs provided details of jobs users executed.

Download sample

Download