In the previous segment, you learnt how to set up a t2.micro EC2 instance. This type of instance can be used for minor tasks, but when it comes to heavier tasks such as those involved while processing big data, you will need a more powerful instance setup that will be able to handle such tasks smoothly while also maintaining a check on the costing of your instance.
This is where Amazon EMR comes into play. It is a managed cluster platform that can help us in running big data frameworks such as Apache Hadoop on top of Amazon EC2 instances. In the following video, you will learn more about the service, its features, use cases and benefits.
Amazon EMR is a managed cluster platform that help us in running big data frameworks , such as Apache Hadoop , on top of Amazon EC2 instances to process and analyse vast amounts of data. helps us to a multi- node managed cluster with a complete setup of essential big data tools, such as Apache Spark, Apache HBase.
You learnt about the lifecycle of an EMR cluster. the following image shows as well.
There are various use case of EMR in the industry such as in Machine Learning, ETL dta pipelines, Real-time steaming and Interactive data analytics.
There are various benefits of using an EMR cluster. some of thee are as follows:
- Cost Saving- EMR cluster pricing depends on instance type and number of instances, which can be changed according to requirements.
- AWS Integration-EMR integrates with other AWS services seamlessly, such as Amazon S3 and Amazon CloudWatch
- Deployment – EMR can easily configure EC2 instances with applications that you choose at launch and other configurations
- Scalability and Flexibility – EMR can easily scale the cluster up or down based on computing needs and can be used with several file systems.
- As you saw in the video, there are various tools that are available to be installed during the cluster setup of an EMR cluster. These include popular Big data processing applications such as:
- Apache Sqoop
- Apache HBase
- Apache Pig
- Apache Spark
- Apache Hadoop and MapReduce
- Apache Mahout
- Apache Oozie
- You will be learning many of these tools in the coming modules as well with the help of Amazon EMR clusters.
- Finally, there is a service known as Amazon EMR Notebooks. This can be used along with EMR clusters running Apache Spark to create and open Jupyter Notebooks.
- You will be using this service extensively during the Apache Spark modules.
In the next segment, you will learn how to set up an Amazon EMR cluster.
Report an error