Draft: Use AWS S3 as Storage Backend#1010
Conversation
| def resolveOne[A](propName: String, | ||
| envName: String, | ||
| builder: String => A): ZIO[system.System, RuntimeException, A] = { | ||
| zio.system.properties.flatMap { map => |
There was a problem hiding this comment.
You can use something like
zio.system.property(propName).some
.orElse(zio.system.env(envName).some)
.orElseFail(new RuntimeException(s"Cannot find system property $propName or environment variable $envName"))
build.sbt
Outdated
| "com.vladsch.flexmark" % "flexmark-ext-yaml-front-matter" % "0.34.32", | ||
| "org.slf4j" % "slf4j-simple" % "1.7.25" | ||
| "org.slf4j" % "slf4j-simple" % "1.7.25", | ||
| "net.java.dev.jets3t" % "jets3t" % "0.7.1" // same as Spark one |
There was a problem hiding this comment.
Thanks for using the same thing as Spark, but I wonder if this will cause conflicts across different versions of Spark.
|
This is a tricky one... I think in the Spark case (which I'd estimate is the majority)
Edit: Actually, I think we don't have to worry about conflicts, with some of the packaging/deployment changes I'm working on. But still, I think I'd rather see S3 support be a plug-in, which means we have to at least figure out the plugin hook for filesystems and get the repository mechanism to support URIs. |
|
|
||
| override def init(path: Path): RIO[BaseEnv, Unit] = ZIO.unit | ||
|
|
||
| def getBucket(p: Path): RIO[BaseEnv, S3Bucket] = { |
There was a problem hiding this comment.
Should the bucket (and optionally a base path) be part of the notebook repository instead? Then, paths would just be paths. You'd parameterize the filesystem with the bucket and base path, and create a new type of NotebookRepository which is configured with the base S3 URI and passes the bucket and base path to the filesystem (or instead, adapt FileBasedRepository to accept a base URI instead of a base path... some of the plumbing for this exists, but is unused)
|
If I have your "blessing" I would rework the NotebookFilesystem API to use URIs instead of Paths that would simplify a lot working with S3. Then I'm ok with moving the S3 implementation inside the Spark module and use whatever library is shipped with Spark (maybe Hadoop), but I'd love to have some hints in how to implement a plugin system for |
|
Hi @jeremyrsmith , I pushed some changes, are they going in the right direction? |
|
@tmnd1991 absolutely I think a lot of things should be moved to I think configuration of plugin-based filesystems would ideally work similarly to how configuration of plugin-based authentication providers works. To be honest we're discussing this internally right now, so that might be something to wait for before putting a lot more effort here 😞 |
|
I have no rush :) if there’s some way to “watch” the discussion would really be interesting to me. Anyway I’ll hold on until you made a decision about those changes 😄 |
It's very drafty and incomplete, but I would love that any maintainer could have a look to point any "big" mistake. To me the most controversial part is that
java.nio.file.Pathis not ergonomic at all for s3 buckets and keys.