Being a tech geek, I always find different sources to educate myself with new technical trends. So far I found World Wide Web (blogs, columns, webcasts, podcasts etc.,) as the most preferable medium to maintain consistency in our daily learning activity. Having said that, one cannot find a proper learning path for a given technology very easily over internet. This gap in continuous learning is filled by reading books which suggests and exemplifies a proper learning path. “HDInsight Essentials” by Rajesh Nadipalli is a perfect example which falls in the afore mentioned category.
HDInsight Essentials book focuses on how to build and deploy a modern big data architecture using Microsoft’s HDInsight platform through which organizations can predict insights and can take better business decisions. This book will teach us on how to build a next generation enterprise Data Lake starting from ingestion through data transformation and finally analysis on big data. Rajesh Nadipalli, with his 18 years of enormous IT experience, architecture and technical leadership skills brings real time use cases to this book. His expertise in building and maintaining Hadoop based Enterprise Data Lakes helps book readers in understanding pros and cons involves in various steps during data lake construction and maintainance.
The greatness of this book comes in the way of explaining the step by step process involved in building data lake. Author strategically took a real time project as a sample data lake construction. Throughout this book, His vision around the Airline data is used as a use case. He broke the process into multiple logical steps and grouped them into different chapters. This made the entire book easy to understand and ready to implement in real life scenarios. The code samples which were suggested in the book are mostly self descriptive and fine tuned.
At this junction, I would like to convey my heartfelt thanks to Packt Publications for giving me an opportunity to conduct technical review and assessment of the book to make it better in given scope. It was a great 3-month journey where we iteratively processed technical content and provided consistent feedback and suggestions to author.
Looking into the contents of the book, we have Chapter 1 which gives a brief introduction of Hadoop Ecosystem and also narrates the business value of big data lakes. It also discusses about Microsoft Hadoop Offering – HDInsight and its deployment options.
Chapter 2 explains on how to choose a path for constructing an enterprise data lake based on Hadoop Ecosystem. It describes a High level use case with Azure HDInsight.
Chapter 3 provides information on how to create a HDInsight cluster in a step by step process. It also narrates on how to maintain the cluster through Azure management portal. This chapter introduces some of the developer tools like HDInsight Emulators for local simulations.
Chapter 4 explains how to administer the HDInsight Cluster. It will also cover Azure blob storage management along with basic Azure PowerShell scripting.
Chapter 5 introduces a real time Data Lake Project. As a first step it shows on how to move data from various data sources to HDInsight cluster using different techniques (SQOOP, HDFS commands, Azure Blob Storage explorer etc.,). This chapter also narrates on how to organize data using HCatalog metastore.
Chapter 6 provides various mechanisms to transform data in HDFS using Hive, Pig and MapReduce. It also gives introduction to Spark and Oozie.
Chapter 7 explains on how to access data in the cluster and analyze it. it also explains on how to use different data analytic connectors and drivers like PowerPivot, Excel Power Query etc., and discuss about other alternatives like Mahout, Giraph.
Chapter 8 provides insights about the new features in HDInsight like Tez, HBase etc. Finally the book comes to an end at Chapter 9 by discussing the challenges while building production data lake and suggests recommendations for a sustainable data lake.
I strongly recommend this book for all Big Data professionals who are all willing to explore Microsoft HDInsight offering. This book will be a good reference in our book shelf helping us in our day to day work involved around Big Data and Microsoft Azure.
Details of the book:
- Pages : 178 pages
- Publisher : Packt Publishing Ltd.; 2nd edition (January, 2015)
- ISBN-13 : 9781784399429