Apache Spark™ is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing. The SageMaker Spark Container is a Docker image used to run batch data processing workloads on Amazon SageMaker using the Apache Spark framework. The container images in this repository are used to build the pre-built container images that are used when running Spark jobs on Amazon SageMaker using the SageMaker Python SDK. The pre-built images are available in the Amazon Elastic Container Registry (Amazon ECR), and this repository serves as a reference for those wishing to build their own customized Spark containers for use in Amazon SageMaker.

Features

  • This project is licensed under the Apache-2.0 License
  • The simplest way to get started with the SageMaker Spark Container is to use the pre-built images via the SageMaker Python SDK
  • To get started building and testing the SageMaker Spark container, you will have to setup a local development environment
  • Many available SageMaker Spark Images
  • Build the pre-built container images that are used when running Spark jobs on Amazon SageMaker
  • It provides high-level APIs in Scala, Java, Python, and R

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow SageMaker Spark Container

SageMaker Spark Container Web Site

Other Useful Business Software
The Most Powerful Software Platform for EHSQ and ESG Management Icon
The Most Powerful Software Platform for EHSQ and ESG Management

Addresses the needs of small businesses and large global organizations with thousands of users in multiple locations.

Choose from a complete set of software solutions across EHSQ that address all aspects of top performing Environmental, Health and Safety, and Quality management programs.
Learn More
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of SageMaker Spark Container!

Additional Project Details

Programming Language

Python

Related Categories

Python Frameworks, Python Business Performance Management Software, Python Data Analytics Tool, Python Stream Processing Tool

Registered

2022-07-04