Skip to content

tauovir/pyspark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark:

Apache Spark is an open source analytics engine used for big data workloads. It can handle both batches as well as real-time analytics and data processing workloads. Spark provides native bindings for the Java, Scala, Python, and R programming languages. In addition, it includes several libraries to support build applications for machine learning [MLlib], stream processing [Spark Streaming], and graph processing [GraphX]

alt text

Apache Spark:

  • Performance 10 to 100 times faster than Hadoop M/R.
  • Ease of development Spark SQL, High-performance SQL engine, API.
  • Language Support, Java, Scala, Python, R.
  • Storage: HDFS, Cloud Storage
  • Resource Management: YARN, Mesos, Kubernetes.

Run in Two Setup

** With Hadoop (Data lake) ** Without Hadoop (Lakehouse: cloud)

Working with Spark

  • Spark DataFrame and API
  • Spark Database and SQL

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors