Mais conteúdo relacionado Semelhante a Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager (20) Mais de DataWorks Summit (20) Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager1. © 2018 Bloomberg Finance L.P. All rights reserved.
DataWorks Summit Berlin 2018
April 19, 2018
Artem Ervits – Hortonworks
Clay Baenziger – Bloomberg
Breathing New Life into Apache Oozie
with Apache Ambari Workflow Manager
2. © 2018 Bloomberg Finance L.P. All rights reserved.
Poll:
• Who here uses Oozie?
— In production?
— Do you use HUE with Oozie?
— How many workflows have you in production?
1-10? 10-50? 50+?
— How many actions does the largest workflow contain?
1-10? 10-50? 50+?
— Do you use Oozie with (or want to)?
HBase? Spark? Python? Deployment Automation?
• Do you like XML?
— Do you have a favorite editor for Oozie workflows?
3. © 2018 Bloomberg Finance L.P. All rights reserved.
Open Source Workflow Managers
• Apache Airflow (Incubating)
• Luigi by Spotify
• Azkaban by LinkedIn
• (And of course) Apache Oozie
4. © 2018 Bloomberg Finance L.P. All rights reserved.
Introduction to Oozie
• Oozie is a workflow scheduler system to manage Apache Hadoop jobs.
• Oozie workflow jobs are Directed Acyclical Graphs (DAGs) of actions.
• Oozie coordinator jobs are recurrent Oozie workflow jobs triggerd by time and data availability.
• Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs as well as
system specific jobs out of the box.
• Oozie is a scalable, reliable and extensible system.
- Paraphrased from http://oozie.apache.org
Actions:
• Map/Reduce
• Hive
• Pig
• HDFS
• Java
• Shell
• Spark
• Sub-Workflow
• E-Mail
• Decision
• Fork
• Join
5. © 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Release Timeline
• 1.x released in 2010. Yahoo! project with two GitHub releases. Added support for workflow jobs.
• 2.x released in 2011. Still with Yahoo! with nine GitHub releases. Added support for coordinator jobs.
• 3.x released in 2013. Project under Apache. Added support for bundle jobs and HBase credentials.
• 4.x released in 2014. Added support for Hive/HCatalog, Spark integration and Oozie server high
availability.
• 5.0 released April 2018. Removes support for Hadoop 1, adds support for Hadoop 3, YARN AM instead
of MR launcher, new actions, code clean up.
- Adopted from: Apache Oozie by
Mohammad Kamrul Islam and Aravind Srinivasan
6. © 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Complaints
• Launcher jobs as map tasks
• Dated UI
• Confusing object model – workflows, coordinators, bundles
• Complicated setup
• XML
• DAG visualization
• SLA alerting
• Fine grained authorization
• Easy access to log files
7. © 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Complaints Improvements
• Launcher jobs as map tasks – solved by Oozie 5.0.0, OOZIE-1770
• Dated UI – OOZIE-2683, targeted for Oozie 5.X (Hue and Workflow Manager today)
• Confusing object model – jobs API, patch available, targeted for 5.X, OOZIE-2339
• Complicated setup – can deploy with embedded Jetty in Oozie 5.0.0, OOZIE-2666
• XML – fluent job API, patch available, targeted for 5.X, OOZIE-2339
• DAG visualization – solved by Oozie 5.0.0, OOZIE-2406
• SLA alerting – since Oozie 4.0.0, OOZIE-1294
• Fine grained authorization – targeted for Oozie 5.X, OOZIE-3196
• Easy access to log files – solved by Oozie 5.0.0, OOZIE-2296
8. © 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Launcher – Prior to Release 5.0
• MR launcher job
9. © 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Launcher – Release 5.0
• OYA: OOZIE-1770: Create Oozie Application Master for YARN
— Removes MR launcher job
• Design Doc
10. © 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Documentation – Before Release 5.0 and After
Documentation redesign
OOZIE-3163: Improve documentation rendering: use fluido skin and better config
11. © 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Workflow Visualiztion – Prior to 5.0 and After
Jung GraphViz
OOZIE-2406: Completely rewrite GraphGenerator code
12. © 2018 Bloomberg Finance L.P. All rights reserved.
Oozie Fluent Job API – Apache Oozie 5.X (Preview)
OOZIE-2339: Provide an API for writing jobs based on the XSD schemas
13. © 2018 Bloomberg Finance L.P. All rights reserved.
Apache Ambari
Ambari Provides:
• Provisioning of a Hadoop Cluster
• Management of a Hadoop Cluster
• Monitoring of a Hadoop Cluster
— A Metrics System for metrics collection
— An Alert Framework
— A dashboard for monitoring the Hadoop cluster
-Paraphrased from http://ambari.apache.org
14. © 2018 Bloomberg Finance L.P. All rights reserved.
Ambari Views
• Ambari Views ”offer a systematic way to plug-in UI capabilities to surface custom
visualization, management and monitoring features in Ambari Web. A "view" is a way of
extending Ambari that allows 3rd parties to plug in new resource types along with the
APIs, providers and UI to support them. In other words, a view is an application that is
deployed into the Ambari container.”
• Key take-aways:
— One does not need an Ambari managed (administrated) cluster
— Third parties can build views packages to run in the Ambari framework too
— Major views available:
(YARN) Capacity Scheduler, (HDFS) Files, HAWQ, Hive, Pig, Storm, Tez, (YARN
ATS) Jobs, (Oozie) Workflow Manager
• Alternatives: Cloudera Hue, bespoke applications
15. © 2018 Bloomberg Finance L.P. All rights reserved.
Workflow Manager – Motivation
• Oozie workflows are defined in XML – too verbose
— Provide GUI workflow builder and editor
— Reduce possibility of user introduced errors
— Provide browser based workflow manager
• Integration with File
Browser (includes S3 support)
— Tighter integration with
Ambari in future
— Can replace existing
Oozie web UI
• Oozie is hard-coded to
display only 25 actions
— WFM doesn’t have this
limit; tested with 300+
action nodes
• Oozie is scalable
— Can scale WFM by
standing-up multiple
Ambari Views servers
16. © 2018 Bloomberg Finance L.P. All rights reserved.
Workflow Manager – Workflow Designer Example
Workflow Manager:
• Available as an Ambari View
• Enables visual editing of Oozie workflows
• Integrated with file browser
• Reduces user input errors
• Minimal input required
17. © 2018 Bloomberg Finance L.P. All rights reserved.
Workflow Manager – Execution View
• Integrated Dashboard with Workflow Manager View
• Manage Oozie jobs
• Drill down to logs
18. © 2018 Bloomberg Finance L.P. All rights reserved.
Workflow Manager – Workflow Design Component
19. © 2018 Bloomberg Finance L.P. All rights reserved.
Workflow Manager – Workflow Dashboard Component
Good Documentation: HDP 2.6 – Workflow Manager Basics
20. © 2018 Bloomberg Finance L.P. All rights reserved.
DataWorks Summit Berlin 2018
• Setup
• Data Definition
• Compactions
Workflow Manager Examples with
HBase
21. © 2018 Bloomberg Finance L.P. All rights reserved.
HBase – Setup
Oozie needs HBase Confguration:
• Oozie Server Code (to support Hbase delegation tokens)
— In libexec (see Server JARs list)
— In oozie-site.xml
<name>oozie.credentials.credentialclasses</name>
<value>hbase=org.apache.oozie.action.hadoop.HbaseCredentials,…</value>
</name>
• Client Workflow Code:
— Add to workflow.xml:
<credentials>
<credential name=”myhbase_creds” type=”hbase”>
[…]
</credential>
</credentials>
— All your normal HBase security settings in the credential section
• Server JARs:
(Copy the following to Oozie’s libexec)
— hbase-common.jar
— hbase-client.jar
— hbase-server.jar
— hbase-protocol.jar
— hbase-hadoop2-compat.jar
22. © 2018 Bloomberg Finance L.P. All rights reserved.
create_my_table.rb:
tables = list
tables.select { |table|
table.eql?('my_table') }
if tables.empty?
create 'my_table',
{NAME => 'my_col'}
end
exit
HBase – Data Definition
HBase Shell:
<action name="HBASE-Shell" cred="hbase_creds">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>hbase</exec>
<argument>shell</argument>
<argument>-n</argument>
<argument>create_my_table.rb</argument>
</shell>
<ok to="do_more_things"/>
<error to="fail"/>
</action>
23. © 2018 Bloomberg Finance L.P. All rights reserved.
HBase – Compactions
HBASE-19528: Major Compaction Tool
• Automatically scales compaction to selected number of servers
• Requires read ability to /hbase
usage: MajorCompactor [-cf <arg>] [-dryRun] -servers <arg> -table <arg>
[...]
Usage instructions
-cf <arg> column families: comma separated eg: a,b,c
-dryRun Dry run, will just output a list of regions that
require compaction based on parameters passed
-minModTime <arg> Compact if store files have
modification time < minModTime
-servers <arg> Concurrent servers compacting
-table <arg> table name
...
24. © 2018 Bloomberg Finance L.P. All rights reserved.
More Resources
• Oozie Examples: https://github.com/dbist/oozie-examples
• Oozie Mailing Lists: http://oozie.apache.org/mail-lists.html
• Artem’s 12 Part Series on WFM: http://bit.ly/2syKUIh
• Clay’s Past Oozie presentations:
— Code Deployment via Oozie: Apache BigData http://bit.ly/2sP2qbj
— HBase Multi-Tenancy with Oozie: DataWorks Summit http://bit.ly/2rw7FIR
25. © 2018 Bloomberg Finance L.P. All rights reserved.
DataWorks Summit Berlin 2018
Demo!
26. © 2018 Bloomberg Finance L.P. All rights reserved.
DataWorks Summit Berlin 2018
Questions?