SparkSession
.builder
.master("local[*]")
.config("spark.sql.warehouse.dir", "C:/tmp/spark")
.config("spark.sql.streaming.checkpointLocation", "C:/tmp/spark/spark-checkpoint")
.appName("my-test")
.getOrCreate
.readStream
.schema(schema)
.json("src/test/data")
.cache
.writeStream
.start
.awaitTermination
Во время выполнения этого образца в искрах 2.1.0 я получил ошибку.
Без опции .cache
она работала по назначению, но с опцией .cache
я получил:
Исключение в потоке "main" org.apache.spark.sql.AnalysisException: запросы с потоковыми источниками должны выполняться с помощью writeStream.start();; FileSource [SRC/тест/данные] at org.apache.spark.sql.catalyst.analysis.UsupportedOperationChecker $.org $apache $spark $sql $катализатор $анализ $UnsupportedOperationChecker $$ throwError (UnsupportedOperationChecker.scala: 196) at org.apache.spark.sql.catalyst.analysis.UsupportedOperationChecker $$ anonfun $checkForBatch $1.apply(UnsupportedOperationChecker.scala: 35) at org.apache.spark.sql.catalyst.analysis.UsupportedOperationChecker $$ anonfun $checkForBatch $1.apply(UnsupportedOperationChecker.scala: 33) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala: 128) at org.apache.spark.sql.catalyst.analysis.UsupportedOperationChecker $.checkForBatch(UnsupportedOperationChecker.scala: 33) at org.apache.spark.sql.execution.QueryExecution.assertSupported(QueryExecution.scala: 58) at org.apache.spark.sql.execution.QueryExecution.withCachedData $lzycompute (QueryExecution.scala: 69) at org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala: 67) at org.apache.spark.sql.execution.QueryExecution.optimizedPlan $lzycompute (QueryExecution.scala: 73) at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala: 73) at org.apache.spark.sql.execution.QueryExecution.sparkPlan $lzycompute (QueryExecution.scala: 79) at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala: 75) at org.apache.spark.sql.execution.QueryExecution.executedPlan $lzycompute (QueryExecution.scala: 84) at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala: 84) at org.apache.spark.sql.execution.CacheManager $$ anonfun $cacheQuery $1.apply(CacheManager.scala: 102) at org.apache.spark.sql.execution.CacheManager.writeLock(CacheManager.scala: 65) at org.apache.spark.sql.execution.CacheManager.cacheQuery(CacheManager.scala: 89) at org.apache.spark.sql.Dataset.persist(Dataset.scala: 2479) at org.apache.spark.sql.Dataset.cache(Dataset.scala: 2489) at org.me.App $.main(App.scala: 23) на org.me.App.main(App.scala)
Любая идея?