GitHub - tauovir/pyspark

Spark:

Apache Spark is an open source analytics engine used for big data workloads. It can handle both batches as well as real-time analytics and data processing workloads. Spark provides native bindings for the Java, Scala, Python, and R programming languages. In addition, it includes several libraries to support build applications for machine learning [MLlib], stream processing [Spark Streaming], and graph processing [GraphX]

Apache Spark:

Performance 10 to 100 times faster than Hadoop M/R.
Ease of development Spark SQL, High-performance SQL engine, API.
Language Support, Java, Scala, Python, R.
Storage: HDFS, Cloud Storage
Resource Management: YARN, Mesos, Kubernetes.

Run in Two Setup

** With Hadoop (Data lake) ** Without Hadoop (Lakehouse: cloud)

Working with Spark

Spark DataFrame and API
Spark Database and SQL

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.vs		.vs
data		data
pysrc		pysrc
src		src
src_notebok		src_notebok
.gitignore		.gitignore
README.md		README.md
base-image.jpg		base-image.jpg
delta.ipynb		delta.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark:

Apache Spark:

Run in Two Setup

Working with Spark

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Spark:

Apache Spark:

Run in Two Setup

Working with Spark

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages