a685
a685
a685 Spark jobs working on a685 Amazon EMR on EKS a685 generate logs which can a685 be very helpful in figuring a685 out points with Spark processes a685 and likewise as a strategy a685 to see Spark outputs. You’ll a685 be able to entry these a685 logs from a wide range a685 of sources. On the a685 Amazon EMR a685 digital cluster console, you’ll a685 be able to entry logs a685 from the Spark Historical past a685 UI. You even have flexibility a685 to push logs into an a685 a685 Amazon Easy Storage Service a685 (Amazon S3) bucket or a685 a685 Amazon CloudWatch Logs a685 . In every methodology, these a685 logs are linked to the a685 particular job in query. The a685 widespread observe of log administration a685 in DevOps tradition is to a685 centralize logging via the forwarding a685 of logs to an enterprise a685 log aggregation system like Splunk a685 or a685 Amazon OpenSearch Service a685 (successor to Amazon Elasticsearch a685 Service). This allows you to a685 see all of the relevant a685 log information in a single a685 place. You’ll be able to a685 establish key tendencies, anomalies, and a685 correlated occasions, and troubleshoot issues a685 sooner and notify the suitable a685 folks in a well timed a685 style.
a685
a685
a685 EMR on EKS Spark logs a685 are generated by Spark and a685 will be accessed by way a685 of the Kubernetes API and a685 kubectl CLI. Subsequently, though it’s a685 potential to put in log a685 forwarding brokers within the a685 Amazon Elastic Kubernetes Service a685 (Amazon EKS) cluster to a685 ahead all Kubernetes logs, which a685 embody Spark logs, this could a685 turn out to be fairly a685 costly at scale since you a685 get data that will not a685 be necessary for Spark customers a685 about Kubernetes. As well as, a685 from a safety standpoint, the a685 EKS cluster logs and entry a685 to kubectl will not be a685 out there to the Spark a685 consumer.
a685
a685
a685 To resolve this downside, this a685 submit proposes utilizing pod templates a685 to create a sidecar container a685 alongside the Spark job pods. a685 The sidecar containers are in a685 a position to entry the a685 logs contained within the Spark a685 pods and ahead these logs a685 to the log aggregator. This a685 strategy permits the logs to a685 be managed individually from the a685 EKS cluster and makes use a685 of a small quantity of a685 assets as a result of a685 the sidecar container is just a685 launched in the course of a685 the lifetime of the Spark a685 job.
a685
a685
a685 Implementing Fluent Bit as a a685 sidecar container
a685
a685
a685 Fluent Bit is a light-weight, a685 extremely scalable, and high-speed logging a685 and metrics processor and log a685 forwarder. It collects occasion information a685 from any supply, enriches that a685 information, and sends it to a685 any vacation spot. Its light-weight a685 and environment friendly design coupled a685 with its many options makes a685 it very engaging to these a685 working within the cloud and a685 in containerized environments. It has a685 been deployed extensively and trusted a685 by many, even in giant a685 and complicated environments. Fluent Bit a685 has zero dependencies and requires a685 solely 650 KB in reminiscence a685 to function, as in comparison a685 with FluentD, which wants about a685 40 MB in reminiscence. Subsequently, a685 it’s a perfect possibility as a685 a log forwarder to ahead a685 logs generated from Spark jobs.
a685
a685
a685 Whenever you submit a job a685 to EMR on EKS, there a685 are a minimum of two a685 Spark containers: the Spark driver a685 and the Spark executor. The a685 variety of Spark executor pods a685 will depend on your job a685 submission configuration. Should you point a685 out multiple a685 spark.executor.situations
a685 , you get the corresponding a685 variety of Spark executor pods. a685 What we need to do a685 right here is run Fluent a685 Bit as sidecar containers with a685 the Spark driver and executor a685 pods. Diagrammatically, it seems to a685 be like the next determine. a685 The Fluent Bit sidecar container a685 reads the indicated logs within a685 the Spark driver and executor a685 pods, and forwards these logs a685 to the goal log aggregator a685 straight.
a685
a685
a685
a685
a685 Pod templates in EMR on a685 EKS
a685
a685
a685 A Kubernetes pod is a a685 gaggle of a number of a685 containers with shared storage, community a685 assets, and a specification for a685 the best way to run a685 the containers. Pod templates are a685 specs for creating pods. It’s a685 a part of the specified a685 state of the workload assets a685 used to run the applying. a685 Pod template information can outline a685 the motive force or executor a685 pod configurations that aren’t supported a685 in commonplace Spark configuration. That a685 being mentioned, Spark is opinionated a685 about sure pod configurations and a685 a few values within the a685 pod template are all the a685 time overwritten by Spark. Utilizing a685 a pod template solely permits a685 Spark to begin with a a685 template pod and never an a685 empty pod in the course a685 of the pod constructing course a685 of. Pod templates are enabled a685 in EMR on EKS once a685 you configure the Spark properties a685 a685 spark.kubernetes.driver.podTemplateFile
a685 and a685 spark.kubernetes.executor.podTemplateFile
a685 . Spark downloads these pod a685 templates to assemble the motive a685 force and executor pods.
a685
a685
a685 Ahead logs generated by Spark a685 jobs in EMR on EKS
a685
a685
a685 A log aggregating system like a685 Amazon OpenSearch Service or Splunk a685 ought to all the time a685 be out there that may a685 settle for the logs forwarded a685 by the Fluent Bit sidecar a685 containers. If not, we offer a685 the next scripts on this a685 submit that can assist you a685 launch a log aggregating system a685 like Amazon OpenSearch Service or a685 Splunk put in on an a685 a685 Amazon Elastic Compute Cloud a685 (Amazon EC2) occasion.
a685
a685
a685 We use a number of a685 companies to create and configure a685 EMR on EKS. We use a685 an a685 AWS Cloud9 a685 workspace to run all a685 of the scripts and to a685 configure the EKS cluster. To a685 organize to run a job a685 script that requires sure Python a685 libraries absent from the generic a685 EMR pictures, we use a685 Amazon Elastic Container Registry a685 (Amazon ECR) to retailer a685 the custom-made EMR container picture.
a685
a685
a685 Create an AWS Cloud9 workspace
a685
a685
a685 Step one is to launch a685 and configure the AWS Cloud9 a685 workspace by following the directions a685 in a685 Create a Workspace a685 within the EKS Workshop. a685 After you create the workspace, a685 we create a685 AWS Identification and Entry Administration a685 (IAM) assets. a685 Create an IAM function a685 for the workspace, a685 connect the function a685 to the workspace, and a685 a685 replace the workspace a685 IAM settings.
a685
a685
a685 Put together the AWS Cloud9 a685 workspace
a685
a685
a685 Clone the next a685 GitHub repository a685 and run the next a685 script to arrange the AWS a685 Cloud9 workspace to be prepared a685 to put in and configure a685 Amazon EKS and EMR on a685 EKS. The shell script prepare_cloud9.sh a685 installs all the required parts a685 for the AWS Cloud9 workspace a685 to construct and handle the a685 EKS cluster. These embody the a685 kubectl command line device, eksctl a685 CLI device, jq, and to a685 replace the a685 AWS Command Line Interface a685 (AWS CLI).
a685
a685
a685 $ sudo yum -y set a685 up git
$ cd ~
$ a685 git clone https://github.com/aws-samples/aws-emr-eks-log-forwarding.git
$ cd aws-emr-eks-log-forwarding
$ a685 cd emreks
$ bash prepare_cloud9.sh
a685
a685
a685 All the required scripts and a685 configuration to run this resolution a685 are discovered within the cloned a685 GitHub repository.
a685
a685
a685 Create a key pair
a685
a685
a685 As a part of this a685 explicit deployment, you want an a685 EC2 key pair to create a685 an EKS cluster. If you a685 have already got an present a685 EC2 key pair, it’s possible a685 you’ll use that key pair. a685 In any other case, you’ll a685 be able to a685 create a key pair a685 .
a685
a685
a685 Set up Amazon EKS and a685 EMR on EKS
a685
a685
a685 After you configure the AWS a685 Cloud9 workspace, in the identical a685 folder ( a685 emreks
a685 ), run the next deployment a685 script:
a685
a685
a685 $ bash deploy_eks_cluster_bash.sh
Deployment Script a685 -- EMR on EKS
-----------------------------------------------
Please present a685 the next data earlier than a685 deployment:
1. Area (In case your a685 Cloud9 desktop is in the a685 identical area as your deployment, a685 you'll be able to depart a685 this clean)
2. Account ID (In a685 case your Cloud9 desktop is a685 working in the identical Account a685 ID as the place your a685 deployment will probably be, you'll a685 be able to depart this a685 clean)
3. Title of the S3 a685 bucket to be created for a685 the EMR S3 storage location
Area: a685 [xx-xxxx-x]: a685 < Press enter for default a685 or enter area > a685
Account ID [xxxxxxxxxxxx]: a685 < Press enter for default a685 or enter account # > a685
EC2 Public Key identify: a685 a685 < Present your key pair a685 identify right here > a685
Default S3 bucket identify for a685 EMR on EKS (don't add a685 s3://): a685 < bucket identify > a685
Bucket created: XXXXXXXXXXX ...
Deploying CloudFormation a685 stack with the next parameters...
Area: a685 xx-xxxx-x | Account ID: xxxxxxxxxxxx a685 | S3 Bucket: XXXXXXXXXXX
...
EKS Cluster a685 and Digital EMR Cluster have a685 been put in.
a685
a685
a685 The final line signifies that a685 set up was profitable.
a685
a685
a685 Log aggregation choices
a685
a685
a685 There are a number of a685 log aggregation and administration instruments a685 in the marketplace. This submit a685 suggests two of the extra a685 in style ones within the a685 trade: Splunk and Amazon OpenSearch a685 Service.
a685
a685
a685 Choice 1: Set up Splunk a685 Enterprise
a685
a685
a685 To manually set up Splunk a685 on an EC2 occasion, full a685 the next steps:
a685
a685
- a685
- a685 Launch an EC2 occasion a685 .
- a685 Set up Splunk a685 .
- a685 Configure the EC2 occasion a685 safety group a685 to allow entry to a685 ports 22, 8000, and 8088.
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685 This submit, nonetheless, gives an a685 automatic strategy to set up a685 Spunk on an EC2 occasion:
a685
a685
- a685
- a685 Obtain the RPM set up a685 file a685 and add it to a685 an accessible Amazon S3 location.
- a685 Add the next a685 YAML script a685 into a685 AWS CloudFormation a685 .
- a685 Present the required parameters, as a685 proven within the screenshots under.
- a685 Select a685 Subsequent a685 and full the steps a685 to create your stack.
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685 Alternatively, run an AWS CLI a685 script like the next:
a685
a685
a685 aws cloudformation create-stack
--stack-name "splunk" a685
--template-body file://splunk_cf.yaml
--parameters ParameterKey=KeyName,ParameterValue="< Title a685 of EC2 Key Pair >" a685
ParameterKey=InstanceType,ParameterValue="t3.medium"
a685 ParameterKey=LatestAmiId,ParameterValue="/aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x86_64-gp2"
ParameterKey=VPCID,ParameterValue="vpc-XXXXXXXXXXX"
a685 ParameterKey=PublicSubnet0,ParameterValue="subnet-XXXXXXXXX"
ParameterKey=SSHLocation,ParameterValue="< a685 CIDR Vary for SSH entry a685 >"
ParameterKey=VpcCidrRange,ParameterValue="172.20.0.0/16"
a685 ParameterKey=RootVolumeSize,ParameterValue="100"
ParameterKey=S3BucketName,ParameterValue="< a685 S3 Bucket Title >"
a685 ParameterKey=S3Prefix,ParameterValue="splunk/splunk-8.2.5-77015bc7a462-linux-2.6-x86_64.rpm"
ParameterKey=S3DownloadLocation,ParameterValue="/tmp" a685
--region < area >
--capabilities a685 CAPABILITY_IAM
a685
a685
- a685
- a685 After you construct the stack, a685 navigate to the stack’s a685 Outputs a685 tab on the AWS CloudFormation a685 console and observe the interior a685 and exterior DNS for the a685 Splunk occasion.
a685
a685
a685
a685
a685
a685 You employ these later to a685 configure the Splunk occasion and a685 log forwarding.
a685
a685
a685
a685
- a685
- a685 To configure Splunk, go to a685 the a685 Sources a685 tab for the CloudFormation a685 stack and find the bodily a685 ID of
a685 EC2Instance
a685 . - a685 Select that hyperlink to go a685 to the particular EC2 occasion.
- a685 Choose the occasion and select a685 a685 Join a685 .
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
- a685
- a685 On the a685 Session Supervisor a685 tab, select a685 Join a685 .
a685
a685
a685
a685
a685
a685
a685
a685 You’re redirected to the occasion’s a685 shell.
a685
a685
- a685
- a685 Set up and configure Splunk a685 as follows:
a685
a685
a685
a685
a685
a685 $ sudo /choose/splunk/bin/splunk begin --accept-license
…
Please a685 enter an administrator username: admin
Password a685 should include a minimum of:
a685 * 8 complete a685 printable ASCII character(s).
Please enter a a685 brand new password:
Please affirm a685 new password:
…
Achieved
a685 a685 a685 a685 a685 a685 a685 a685 a685 a685 a685 a685 [ OK ]
Ready a685 for internet server at http://127.0.0.1:8000 a685 to be out there......... Achieved
The a685 Splunk internet interface is at a685 http://ip-xx-xxx-xxx-x.us-east-2.compute.inner:8000
a685
a685
- a685
- a685 Enter the Splunk website utilizing a685 the
a685 SplunkPublicDns
a685 worth from the stack a685 outputs (for instance,a685 http://ec2-xx-xxx-xxx-x.us-east-2.compute.amazonaws.com:8000
a685 ). Notice the port variety a685 of 8000. - a685 Log in with the consumer a685 identify and password you offered.
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685 Configure HTTP Occasion Collector
a685
a685
a685 To configure Splunk to have a685 the ability to obtain logs a685 from Fluent Bit, configure the a685 HTTP Occasion Collector information enter:
a685
a685
- a685
- a685 Go to a685 Settings a685 and select a685 Knowledge enter a685 .
- a685 Select a685 HTTP Occasion Collector a685 .
- a685 Select a685 World Settings a685 .
- a685 Choose a685 Enabled a685 , maintain port quantity 8088, a685 then select a685 Save a685 .
- a685 Select a685 New Token a685 .
- a685 For a685 Title a685 , enter a reputation (for a685 instance,
a685 emreksdemo
a685 ). - a685 Select a685 Subsequent a685 .
- a685 For a685 Obtainable merchandise(s) for Indexes a685 , add a minimum of a685 the principle index.
- a685 Select a685 Assessment a685 after which a685 Submit a685 .
- a685 Within the checklist of HTTP a685 Occasion Gather tokens, copy the a685 token worth for
a685 emreksdemo
a685 .
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685 You employ it when configuring a685 the Fluent Bit output.
a685
a685
a685
a685
a685 Choice 2: Arrange Amazon OpenSearch a685 Service
a685
a685
a685 Your different log aggregation possibility a685 is to make use of a685 Amazon OpenSearch Service.
a685
a685
a685 Provision an OpenSearch Service area
a685
a685
a685 Provisioning an OpenSearch Service area a685 may be very easy. On a685 this submit, we offer a a685 easy script and configuration to a685 provision a primary area. To a685 do it your self, discuss a685 with a685 Creating and managing Amazon OpenSearch a685 Service domains a685 .
a685
a685
a685 Earlier than you begin, get a685 the ARN of the IAM a685 function that you simply use a685 to run the Spark jobs. a685 Should you created the EKS a685 cluster with the offered script, a685 go to the CloudFormation stack a685 a685 emr-eks-iam-stack
a685 . On the a685 Outputs a685 tab, find the a685 IAMRoleArn
a685 output and replica this a685 ARN. We additionally modify the a685 IAM function afterward, after we a685 create the OpenSearch Service area.
a685
a685
a685
a685
a685 Should you’re utilizing the offered a685 a685 opensearch.sh
a685 installer, earlier than you a685 run it, modify the file.
a685
a685
a685 From the foundation folder of a685 the GitHub repository, a685 cd to opensearch
a685 and modify a685 opensearch.sh
a685 (you may also use a685 your most popular editor):
a685
a685
a685 [../aws-emr-eks-log-forwarding] $ cd opensearch
[../aws-emr-eks-log-forwarding/opensearch] $ a685 vi opensearch.sh
a685
a685
a685 Configure opensearch.sh to suit your a685 surroundings, for instance:
a685
a685
a685 # identify of our Amazon a685 OpenSearch cluster
export ES_DOMAIN_NAME="emreksdemo"
# Elasticsearch model
export a685 ES_VERSION="OpenSearch_1.0"
# Occasion Sort
export INSTANCE_TYPE="t3.small.search"
# OpenSearch a685 Dashboards admin consumer
export ES_DOMAIN_USER="emreks"
# OpenSearch a685 Dashboards admin password
export ES_DOMAIN_PASSWORD='< ADD a685 YOUR PASSWORD >'
# Area
export REGION='us-east-1'
a685
a685
a685 Run the script:
a685
a685
a685 [../aws-emr-eks-log-forwarding/opensearch] $ bash opensearch.sh
a685
a685
a685 Configure your OpenSearch Service area
a685
a685
a685 After you arrange your OpenSearch a685 service area and it’s lively, a685 make the next configuration modifications a685 to permit logs to be a685 ingested into Amazon OpenSearch Service:
a685
a685
- a685
- a685 On the Amazon OpenSearch Service a685 console, on the a685 Domains a685 web page, select your a685 area.
a685
a685
a685
a685
a685
a685
a685
- a685
- a685 On the a685 Safety configuration a685 tab, select a685 Edit a685 .
a685
a685
a685
a685
a685
a685
a685
- a685
- a685 For a685 Entry Coverage a685 , choose a685 Solely use fine-grained entry management a685 .
- a685 Select a685 Save modifications a685 .
a685
a685
a685
a685
a685
a685
a685
a685 The entry coverage ought to a685 appear like the next code:
a685
a685
a685 {
"Model": "2012-10-17",
a685 "Assertion": [
a685 {
a685 "Effect": "Allow",
a685 "Principal": {
a685 a685 "AWS": "*"
a685 },
a685 a685 "Action": "es:*",
a685 "Resource": "arn:aws:es:xx-xxxx-x:xxxxxxxxxxxx:domain/emreksdemo/*"
a685 }
]
}
a685
a685
- a685
- a685 When the area is lively a685 once more, copy the area a685 ARN.
a685
a685
a685
a685
a685
a685 We use it to configure a685 the Amazon EMR job IAM a685 function we talked about earlier.
a685
a685
- a685
- a685 Select the hyperlink for a685 OpenSearch Dashboards URL a685 to enter Amazon OpenSearch a685 Service Dashboards.
a685
a685
a685
a685
a685
a685
a685
- a685
- a685 In Amazon OpenSearch Service Dashboards, a685 use the consumer identify and a685 password that you simply configured a685 earlier within the opensearch.sh file.
- a685 Select the choices icon and a685 select a685 Safety a685 beneath a685 OpenSearch Plugins a685 .
a685
a685
a685
a685
a685
a685
a685
a685
a685
- a685
- a685 Select a685 Roles a685 .
- a685 Select a685 Create function a685 .
a685
a685
a685
a685
a685
a685
a685
a685
a685
- a685
- a685 Enter the brand new function’s a685 identify, cluster permissions, and index a685 permissions. For this submit, identify a685 the function
a685 fluentbit_role
a685 and provides cluster permissions a685 to the next:
a685 a685- a685
a685 indices:admin/create
a685 indices:admin/template/get
a685 indices:admin/template/put
a685 cluster:admin/ingest/pipeline/get
a685 cluster:admin/ingest/pipeline/put
a685 indices:information/write/bulk
a685 indices:information/write/bulk*
a685 create_index
a685 a685a685
a685 a685a685
a685 a685a685
a685 a685a685
a685 a685a685
a685 a685a685
a685 a685a685
a685 a685a685
a685 a685a685
a685
a685
a685
a685
a685
a685
a685
- a685
- a685 Within the a685 Index permissions a685 part, give write permission a685 to the index
a685 fluent-*
a685 . - a685 On the a685 Mapped customers a685 tab, select a685 Handle mapping a685 .
- a685 For a685 Backend roles a685 , enter the Amazon EMR a685 job execution IAM function ARN a685 to be mapped to the a685
a685 fluentbit_role
a685 function. - a685 Select a685 Map a685 .
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
a685
- a685
- a685 To finish the safety configuration, a685 go to the IAM console a685 and add the next inline a685 coverage to the EMR on a685 EKS IAM function entered within a685 the backend function. Exchange the a685 useful resource ARN with the a685 ARN of your OpenSearch Service a685 area.
a685
a685
a685
a685
a685
a685 {
"Model": a685 "2012-10-17",
"Assertion": a685 [
a685 {
a685 a685 a685 "Sid": "VisualEditor0",
a685 a685 "Effect": "Allow",
a685 a685 a685 "Action": [
a685 a685 a685 "es:ESHttp*"
a685 a685 a685 ],
a685 a685 "Useful resource": "arn:aws:es:us-east-2:XXXXXXXXXXXX:area/emreksdemo"
a685 a685 }
a685 ]
}
a685
a685
a685 The configuration of Amazon OpenSearch a685 Service is full and prepared a685 for ingestion of logs from a685 the Fluent Bit sidecar container.
a685
a685
a685 Configure the Fluent Bit sidecar a685 container
a685
a685
a685 We have to write two a685 configuration information to configure a a685 Fluent Bit sidecar container. The a685 primary is the Fluent Bit a685 configuration itself, and the second a685 is the Fluent Bit sidecar a685 subprocess configuration that makes certain a685 that the sidecar operation ends a685 when the principle Spark job a685 ends. The instructed configuration offered a685 on this submit is for a685 Splunk and Amazon OpenSearch Service. a685 Nevertheless, you’ll be able to a685 configure Fluent Bit with different a685 third-party log aggregators. For extra a685 details about configuring outputs, discuss a685 with a685 Outputs a685 .
a685
a685
a685 Fluent Bit ConfigMap
a685
a685
a685 The next pattern ConfigMap is a685 from the a685 GitHub repo a685 :
a685
a685
a685 apiVersion: v1
form: ConfigMap
metadata:
identify: a685 fluent-bit-sidecar-config
namespace: sparkns
a685 labels:
app.kubernetes.io/identify: a685 fluent-bit
information:
fluent-bit.conf: |
a685 [SERVICE]
a685 a685 Flush a685 1
a685 a685 Log_Level a685 data
a685 a685 Daemon a685 off
a685 a685 Parsers_File parsers.conf
a685 a685 HTTP_Server On
a685 a685 HTTP_Listen a685 0.0.0.0
a685 HTTP_Port a685 2020
a685 @INCLUDE input-application.conf
a685 @INCLUDE input-event-logs.conf
a685 @INCLUDE output-splunk.conf
a685 @INCLUDE output-opensearch.conf
a685 input-application.conf: |
a685 [INPUT]
a685 Title a685 a685 a685 tail
a685 a685 Path a685 a685 /var/log/spark/consumer/*/*
a685 a685 Path_Key a685 a685 filename
a685 a685 Buffer_Chunk_Size 1M
a685 Buffer_Max_Size a685 5M
a685 a685 Skip_Long_Lines On
a685 a685 Skip_Empty_Lines On
a685 input-event-logs.conf: |
a685 [INPUT]
a685 Title a685 a685 a685 tail
a685 a685 Path a685 a685 /var/log/spark/apps/*
a685 a685 Path_Key a685 a685 filename
a685 a685 Buffer_Chunk_Size 1M
a685 Buffer_Max_Size a685 5M
a685 a685 Skip_Long_Lines On
a685 a685 Skip_Empty_Lines On
a685 output-splunk.conf: |
a685 [OUTPUT]
a685 Title a685 a685 a685 splunk
a685 Match a685 a685 *
a685 a685 Host a685 a685 < a685 INTERNAL DNS of Splunk EC2 a685 Occasion >
a685 Port a685 a685 a685 8088
a685 TLS a685 a685 a685 On
a685 a685 TLS.Confirm a685 Off
a685 Splunk_Token a685 < Token a685 as offered by the HTTP a685 Occasion Collector in Splunk >
a685 output-opensearch.conf: |
[OUTPUT]
a685 a685 Title a685 a685 es
a685 a685 Match a685 a685 *
a685 Host a685 a685 a685 < HOST NAME of a685 the OpenSearch Area | No a685 HTTP protocol >
a685 a685 Port a685 a685 443
a685 a685 TLS a685 a685 On
a685 a685 AWS_Auth a685 On
a685 a685 AWS_Region a685 < Area a685 >
a685 Retry_Limit a685 6
a685
a685
a685 In your AWS Cloud9 workspace, a685 modify the ConfigMap accordingly. Present a685 the values for the placeholder a685 textual content by working the a685 next instructions to enter the a685 VI editor mode. If most a685 popular, you need to use a685 PICO or a distinct editor:
a685
a685
a685 [../aws-emr-eks-log-forwarding] $ cd kube/configmaps
[../aws-emr-eks-log-forwarding/kube/configmaps] a685 $ vi emr_configmap.yaml
# Modify the a685 emr_configmap.yaml as above
# Save the a685 file as soon as it's a685 accomplished
a685
a685
a685 Full both the Splunk output a685 configuration or the Amazon OpenSearch a685 Service output configuration.
a685
a685
a685 Subsequent, run the next instructions a685 so as to add the a685 2 Fluent Bit sidecar and a685 subprocess ConfigMaps:
a685
a685
a685 [../aws-emr-eks-log-forwarding/kube/configmaps] $ kubectl apply -f a685 emr_configmap.yaml
[../aws-emr-eks-log-forwarding/kube/configmaps] $ kubectl apply -f a685 emr_entrypoint_configmap.yaml
a685
a685
a685 You don’t want to switch a685 the second ConfigMap as a a685 result of it’s the subprocess a685 script that runs contained in a685 the Fluent Bit sidecar container. a685 To confirm that the ConfigMaps a685 have been put in, run a685 the next command:
a685
a685
a685 $ kubectl get cm -n a685 sparkns
NAME a685 a685 a685 a685 a685 DATA AGE
fluent-bit-sidecar-config a685 6 a685 15s
fluent-bit-sidecar-wrapper a685 2 a685 15s
a685
a685
a685 Arrange a custom-made EMR container a685 picture
a685
a685
a685 To run the a685 pattern PySpark script a685 , the script requires the a685 Boto3 package deal that’s not a685 out there in the usual a685 EMR container pictures. If you a685 wish to run your individual a685 script and it doesn’t require a685 a custom-made EMR container picture, a685 it’s possible you’ll skip this a685 step.
a685
a685
a685 Run the next script:
a685
a685
a685 [../aws-emr-eks-log-forwarding] $ cd ecr
[../aws-emr-eks-log-forwarding/ecr] $ a685 bash create_custom_image.sh <area> <EMR container a685 picture account quantity>
a685
a685
a685 The EMR container picture account a685 quantity will be obtained from a685 a685 choose a base picture a685 URI a685 . This documentation additionally gives a685 the suitable ECR registry account a685 quantity. For instance, the registry a685 account quantity for a685 us-east-1 is 755674844232
a685 .
a685
a685
a685 To confirm the repository and a685 picture, run the next instructions:
a685
a685
a685 $ aws ecr describe-repositories --region a685 < area > | grep a685 emr-6.5.0-custom
a685 a685 "repositoryArn": "arn:aws:ecr:xx-xxxx-x:xxxxxxxxxxxx:repository/emr-6.5.0-custom",
a685 a685 a685 "repositoryName": "emr-6.5.0-custom",
a685 a685 "repositoryUri": " a685 xxxxxxxxxxxx.dkr.ecr.xx-xxxx-x.amazonaws.com/emr-6.5.0-custom",
$ aws ecr describe-images --region a685 < area > --repository-name emr-6.5.0-custom a685 | jq .imageDetails[0].imageTags
[
"latest"
]
a685
a685
a685 Put together pod templates for a685 Spark jobs
a685
a685
a685 Add the 2 Spark driver a685 and Spark executor pod templates a685 to an S3 bucket and a685 prefix. The 2 pod templates a685 will be discovered within the a685 a685 GitHub repository a685 :
a685
a685
- a685
- a685 emr_driver_template.yaml a685 – Spark driver pod a685 template
- a685 emr_executor_template.yaml a685 – Spark executor pod a685 template
a685
a685
a685
a685
a685
a685
a685
a685 The pod templates offered right a685 here shouldn’t be modified.
a685
a685
a685 Submitting a Spark job with a685 a Fluent Bit sidecar container
a685
a685
a685 This Spark job instance makes a685 use of the a685 bostonproperty.py a685 script. To make use a685 of this script, add it a685 to an accessible S3 bucket a685 and prefix and full the a685 previous steps to make use a685 of an EMR custom-made container a685 picture. You additionally must add a685 the CSV file from the a685 a685 GitHub repo a685 , which it’s good to a685 obtain and unzip. Add the a685 unzipped file to the next a685 location: a685 s3://<your chosen bucket>/<first degree folder>/information/boston-property-assessment-2021.csv
a685 .
a685
a685
a685 The next instructions assume that a685 you simply launched your EKS a685 cluster and digital EMR cluster a685 with the parameters indicated within a685 the a685 GitHub repo a685 .
a685
a685
a685 Variable | a685 The place to Discover the a685 Data or the Worth Required |
a685 EMR_EKS_CLUSTER_ID | a685 Amazon EMR console digital cluster a685 web page |
a685 EMR_EKS_EXECUTION_ARN | a685 IAM function ARN |
a685 EMR_RELEASE | a685 emr-6.5.0-latest |
a685 S3_BUCKET | a685 The bucket you create in a685 Amazon S3 |
a685 S3_FOLDER | a685 The popular prefix you need a685 to use in Amazon S3 |
a685 CONTAINER_IMAGE | a685 The URI in Amazon ECR a685 the place your container picture a685 is |
a685 SCRIPT_NAME | a685 emreksdemo-script or a reputation you a685 like |
a685
a685
a685 Alternatively, use the a685 offered script a685 to run the job. a685 Change the listing to the a685 a685 scripts
a685 folder in a685 emreks
a685 and run the script a685 as follows:
a685
a685
a685 [../aws-emr-eks-log-forwarding] cd emreks/scripts
[../aws-emr-eks-log-forwarding/emreks/scripts] bash run_emr_script.sh a685 < S3 bucket identify > a685 < ECR container picture > a685 < script path>
Instance: bash run_emr_script.sh a685 emreksdemo-123456 12345678990.dkr.ecr.us-east-2.amazonaws.com/emr-6.5.0-custom s3://emreksdemo-123456/scripts/scriptname.py
a685
a685
a685 After you submit the Spark a685 job efficiently, you get a a685 return JSON response like the a685 next:
a685
a685
a685 {
"id": a685 "0000000305e814v0bpt",
"identify": a685 "emreksdemo-job",
"arn": a685 "arn:aws:emr-containers:xx-xxxx-x:XXXXXXXXXXX:/virtualclusters/upobc00wgff5XXXXXXXXXXX/jobruns/0000000305e814v0bpt",
"virtualClusterId": a685 "upobc00wgff5XXXXXXXXXXX"
}
a685
a685
a685 What occurs once you submit a685 a Spark job with a a685 sidecar container
a685
a685
a685 After you submit a Spark a685 job, you’ll be able to a685 see what is going on a685 by viewing the pods which a685 can be generated and the a685 corresponding logs. First, utilizing kubectl, a685 get an inventory of the a685 pods generated within the namespace a685 the place the EMR digital a685 cluster runs. On this case, a685 it’s a685 sparkns
a685 . The primary pod within a685 the following code is the a685 job controller for this explicit a685 Spark job. The second pod a685 is the Spark executor; there a685 will be multiple pod relying a685 on what number of executor a685 situations are requested for within a685 the Spark job setting—we requested a685 for one right here. The a685 third pod is the Spark a685 driver pod.
a685
a685
a685 $ kubectl get pods -n a685 sparkns
NAME a685 a685 a685 a685 a685 a685 a685 a685 READY STATUS a685 RESTARTS a685 AGE
0000000305e814v0bpt-hvwjs a685 a685 a685 3/3 a685 Operating a685 0 a685 a685 25s
emreksdemo-script-1247bf80ae40b089-exec-1 a685 0/3 a685 Pending 0 a685 a685 0s
spark-0000000305e814v0bpt-driver a685 a685 a685 3/3 a685 Operating 0 a685 a685 11s
a685
a685
a685 To view what occurs within a685 the sidecar container, observe the a685 logs within the Spark driver a685 pod and discuss with the a685 sidecar. The sidecar container launches a685 with the Spark pods and a685 persists till the file a685 /var/log/fluentd/main-container-terminated
a685 is not out there. a685 For extra details about how a685 Amazon EMR controls the pod a685 lifecycle, discuss with a685 Utilizing pod templates a685 . The subprocess script ties a685 the sidecar container to this a685 similar lifecycle and deletes itself a685 upon the EMR managed pod a685 lifecycle course of.
a685
a685
a685 $ kubectl logs spark-0000000305e814v0bpt-driver -n a685 sparkns -c custom-side-car-container --follow=true
Ready a685 for file /var/log/fluentd/main-container-terminated to look...
AWS a685 for Fluent Bit Container Picture a685 Model 2.24.0Start wait: 1652190909
Elapsed Wait: a685 0
Not discovered depend: 0
Ready...
Fluent Bit a685 v1.9.3
* Copyright (C) 2015-2022 The a685 Fluent Bit Authors
* Fluent Bit a685 is a CNCF sub-project beneath a685 the umbrella of Fluentd
* https://fluentbit.io
[2022/05/10 a685 13:55:09] [ info] [fluent bit] a685 model=1.9.3, commit=9eb4996b7d, pid=11
[2022/05/10 13:55:09] [ a685 info] [storage] model=1.2.0, sort=memory-only, sync=regular, a685 checksum=disabled, max_chunks_up=128
[2022/05/10 13:55:09] [ info] a685 [cmetrics] model=0.3.1
[2022/05/10 13:55:09] [ info] a685 [output:splunk:splunk.0] employee #0 began
[2022/05/10 13:55:09] a685 [ info] [output:splunk:splunk.0] employee #1 a685 began
[2022/05/10 13:55:09] [ info] [output:es:es.1] a685 employee #0 began
[2022/05/10 13:55:09] [ a685 info] [output:es:es.1] employee #1 began
[2022/05/10 a685 13:55:09] [ info] [http_server] pay a685 attention iface=0.0.0.0 tcp_port=2020
[2022/05/10 13:55:09] [ a685 info] [sp] stream processor began
Ready a685 for file /var/log/fluentd/main-container-terminated to look...
Final a685 heartbeat: 1652190914
Elapsed Time since after a685 heartbeat: 0
Discovered depend: 0
checklist information:
-rw-r--r-- a685 1 saslauth 65534 0 Might a685 10 13:55 /var/log/fluentd/main-container-terminated
Final heartbeat: 1652190918
…
[2022/05/10 a685 13:56:09] [ info] [input:tail:tail.0] inotify_fs_add(): a685 inode=58834691 watch_fd=6 identify=/var/log/spark/consumer/spark-0000000305e814v0bpt-driver/stdout-s3-container-log-in-tail.pos
[2022/05/10 13:56:09] [ a685 info] [input:tail:tail.1] inotify_fs_add(): inode=54644346 watch_fd=1 a685 identify=/var/log/spark/apps/spark-0000000305e814v0bpt
Outdoors of loop, main-container-terminated file a685 not exists
ls: can't entry /var/log/fluentd/main-container-terminated: a685 No such file or listing
The a685 file /var/log/fluentd/main-container-terminated does not exist a685 anymore;
TERMINATED PROCESS
Fluent-Bit pid: 11
Killing course a685 of after sleeping for 15 a685 seconds
root a685 11 a685 8 a685 0 13:55 ? a685 a685 00:00:00 /fluent-bit/bin/fluent-bit -e /fluent-bit/firehose.so -e a685 /fluent-bit/cloudwatch.so -e /fluent-bit/kinesis.so -c /fluent-bit/and a685 so on/fluent-bit.conf
root a685 114 a685 7 a685 0 13:56 ? a685 a685 00:00:00 grep fluent
Killing course of a685 11
[2022/05/10 13:56:24] [engine] caught sign a685 (SIGTERM)
[2022/05/10 13:56:24] [ info] [input] a685 pausing tail.0
[2022/05/10 13:56:24] [ info] a685 [input] pausing tail.1
[2022/05/10 13:56:24] [ a685 warn] [engine] service will shutdown a685 in max 5 seconds
[2022/05/10 13:56:25] a685 [ info] [engine] service has a685 stopped (0 pending duties)
[2022/05/10 13:56:25] a685 [ info] [input:tail:tail.1] inotify_fs_remove(): inode=54644346 a685 watch_fd=1
[2022/05/10 13:56:25] [ info] [input:tail:tail.0] a685 inotify_fs_remove(): inode=60917120 watch_fd=1
[2022/05/10 13:56:25] [ a685 info] [input:tail:tail.0] inotify_fs_remove(): inode=60917121 watch_fd=2
[2022/05/10 a685 13:56:25] [ info] [input:tail:tail.0] inotify_fs_remove(): a685 inode=58834690 watch_fd=3
[2022/05/10 13:56:25] [ info] a685 [input:tail:tail.0] inotify_fs_remove(): inode=58834692 watch_fd=4
[2022/05/10 13:56:25] a685 [ info] [input:tail:tail.0] inotify_fs_remove(): inode=58834689 a685 watch_fd=5
[2022/05/10 13:56:25] [ info] [input:tail:tail.0] a685 inotify_fs_remove(): inode=58834691 watch_fd=6
[2022/05/10 13:56:25] [ a685 info] [output:splunk:splunk.0] thread employee #0 a685 stopping...
[2022/05/10 13:56:25] [ info] [output:splunk:splunk.0] a685 thread employee #0 stopped
[2022/05/10 13:56:25] a685 [ info] [output:splunk:splunk.0] thread employee a685 #1 stopping...
[2022/05/10 13:56:25] [ info] a685 [output:splunk:splunk.0] thread employee #1 stopped
[2022/05/10 a685 13:56:25] [ info] [output:es:es.1] thread a685 employee #0 stopping...
[2022/05/10 13:56:25] [ a685 info] [output:es:es.1] thread employee #0 a685 stopped
[2022/05/10 13:56:25] [ info] [output:es:es.1] a685 thread employee #1 stopping...
[2022/05/10 13:56:25] a685 [ info] [output:es:es.1] thread employee a685 #1 stopped
a685
a685
a685 View the forwarded logs in a685 Splunk or Amazon OpenSearch Service
a685
a685
a685 To view the forwarded logs, a685 do a search in Splunk a685 or on the Amazon OpenSearch a685 Service console. Should you’re utilizing a685 a shared log aggregator, you’ll a685 have to filter the outcomes. a685 On this configuration, the logs a685 tailed by Fluent Bit are a685 within the a685 /var/log/spark/*
a685 . The next screenshots present a685 the logs generated particularly by a685 the Kubernetes Spark driver a685 stdout
a685 that had been forwarded a685 to the log aggregators. You’ll a685 be able to examine the a685 outcomes with the logs offered a685 utilizing kubectl:
a685
a685
a685 kubectl logs < Spark Driver a685 Pod > -n < namespace a685 > -c spark-kubernetes-driver --follow=true
…
root
|-- a685 PID: string (nullable = true)
a685 |-- CM_ID: string (nullable = a685 true)
|-- GIS_ID: string (nullable a685 = true)
|-- ST_NUM: string a685 (nullable = true)
|-- ST_NAME: a685 string (nullable = true)
|-- a685 UNIT_NUM: string (nullable = true)
a685 |-- CITY: string (nullable = a685 true)
|-- ZIPCODE: string (nullable a685 = true)
|-- BLDG_SEQ: string a685 (nullable = true)
|-- NUM_BLDGS: a685 string (nullable = true)
|-- a685 LUC: string (nullable = true)
…
|02108|RETAIL a685 CONDO a685 a685 |361450.0 a685 a685 |63800.0 a685 a685 |5977500.0 a685 |
|02108|RETAIL STORE DETACH a685 |2295050.0 a685 a685 |988200.0 a685 a685 |3601900.0 a685 |
|02108|SCHOOL a685 a685 a685 |1.20858E7 a685 a685 a685 |1.20858E7 a685 |1.20858E7 a685 |
|02108|SINGLE FAM DWELLING a685 |5267156.561085973 a685 |1153400.0 a685 |1.57334E7 a685 |
+-----+-----------------------+--------------------+---------------+---------------+
solely exhibiting a685 high 50 rows
a685
a685
a685 The next screenshot reveals the a685 Splunk logs.
a685
a685
a685
a685
a685 The next screenshots present the a685 Amazon OpenSearch Service logs.
a685
a685
a685
a685
a685 Non-compulsory: Embody a buffer between a685 Fluent Bit and the log a685 aggregators
a685
a685
a685 Should you anticipate to generate a685 a variety of logs due a685 to excessive concurrent Spark jobs a685 creating a number of particular a685 person connects that will overwhelm a685 your Amazon OpenSearch Service or a685 Splunk log aggregation clusters, contemplate a685 using a buffer between the a685 Fluent Bit sidecars and your a685 log aggregator. One possibility is a685 to make use of a685 Amazon Kinesis Knowledge Firehose a685 because the buffering service.
a685
a685
a685 Kinesis Knowledge Firehose has built-in a685 supply to each Amazon OpenSearch a685 Service and Splunk. If utilizing a685 Amazon OpenSearch Service, discuss with a685 a685 Loading streaming information from Amazon a685 Kinesis Knowledge Firehose a685 . If utilizing Splunk, discuss a685 with a685 Configure Amazon Kinesis Firehose to a685 ship information to the Splunk a685 platform a685 and a685 Select Splunk for Your Vacation a685 spot a685 .
a685
a685
a685 To configure Fluent Bit to a685 Kinesis Knowledge Firehose, add the a685 next to your ConfigMap output. a685 Confer with the a685 GitHub ConfigMap instance a685 and add the a685 @INCLUDE
a685 beneath the a685 [SERVICE]
a685 part:
a685
a685
a685 a685 @INCLUDE output-kinesisfirehose.conf
…
output-kinesisfirehose.conf: |
a685 [OUTPUT]
a685 a685 Title a685 a685 kinesis_firehose
a685 a685 Match a685 a685 *
a685 a685 area a685 a685 < area >
a685 a685 delivery_stream < Kinesis Firehose Stream a685 Title >
a685
a685
a685 Non-compulsory: Use information streams for a685 Amazon OpenSearch Service
a685
a685
a685 Should you’re in a state a685 of affairs the place the a685 variety of paperwork grows quickly a685 and also you don’t must a685 replace older paperwork, it’s good a685 to handle the OpenSearch Service a685 cluster. This includes steps like a685 making a rollover index alias, a685 defining a write index, and a685 defining widespread mappings and settings a685 for the backing indexes. Think a685 about using information streams to a685 simplify this course of and a685 implement a setup that most a685 accurately fits your time collection a685 information. For directions on implementing a685 information streams, discuss with a685 Knowledge streams a685 .
a685
a685
a685 Clear up
a685
a685
a685 To keep away from incurring a685 future fees, delete the assets a685 by deleting the CloudFormation stacks a685 that had been created with a685 this script. This removes the a685 EKS cluster. Nevertheless, earlier than a685 you try this, take away a685 the EMR digital cluster first a685 by working the a685 delete-virtual-cluster a685 command. Then delete all a685 of the CloudFormation stacks generated a685 by the deployment script.
a685
a685
a685 Should you launched an OpenSearch a685 Service area, you’ll be able a685 to delete the area from a685 the OpenSearch Service area. Should a685 you used the script to a685 launch a Splunk occasion, you’ll a685 be able to go to a685 the CloudFormation stack that launched a685 the Splunk occasion and delete a685 the CloudFormation stack. This removes a685 take away the Splunk occasion a685 and related assets.
a685
a685
a685 You may as well use a685 the next scripts to wash a685 up assets:
a685
a685
a685 a685
a685 Conclusion
a685
a685
a685 EMR on EKS facilitates working a685 Spark jobs on Kubernetes to a685 attain very quick and cost-efficient a685 Spark operations. That is made a685 potential via scheduling transient pods a685 which can be launched after a685 which deleted the roles are a685 full. To log all these a685 operations in the identical lifecycle a685 of the Spark jobs, this a685 submit gives an answer utilizing a685 pod templates and Fluent Bit a685 that’s light-weight and highly effective. a685 This strategy presents a decoupled a685 method of log forwarding based a685 mostly on the Spark software a685 degree and never on the a685 Kubernetes cluster degree. It additionally a685 avoids routing via intermediaries like a685 CloudWatch, decreasing value and complexity. a685 On this method, you’ll be a685 able to deal with safety a685 issues and DevOps and system a685 administration ease of administration whereas a685 offering Spark customers with insights a685 into their Spark jobs in a685 a cost-efficient and purposeful method.
a685
a685
a685 If in case you have a685 questions or ideas, please depart a685 a remark.
a685
a685
a685
a685
a685 In regards to the Writer
a685
a685
a685 Matthew Tan a685 is a Senior Analytics a685 Options Architect at Amazon Net a685 Providers and gives steering to a685 prospects creating options with AWS a685 Analytics companies on their analytics a685 workloads. a685 a685 a685
a685
a685 a685
a685
a685