天道酬勤,学无止境

Apache Spark and Java error - Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2

I am new in spark framework. I have tried to create a sample application using spark and java. I have the following code

Pom.xml

<dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.10</artifactId>
      <version>1.6.1</version>
</dependency>

Source

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.*;

public class SparkTest {
    public static void main(String[] args)  {
        SparkConf sparkConf = new SparkConf()
                .setAppName("Example Spark App")
                .setMaster("local[*]"); // Delete this line when submitting to a cluster
        JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);
        JavaRDD<String> stringJavaRDD = sparkContext.textFile("nationalparks.csv");
        System.out.println("Number of lines in file = " + stringJavaRDD.count());
    }
}

I am trying to run above code using IntelliJ IDE. But I got an error like this

"C:\Program Files\Java\jdk-11\bin\java.exe" "-javaagent:C:\Users\amanaf\AppData\Local\JetBrains\IntelliJ IDEA Community Edition 2018.3\lib\idea_rt.jar=55665:C:\Users\amanaf\AppData\Local\JetBrains\IntelliJ IDEA Community Edition 2018.3\bin" -Dfile.encoding=UTF-8 -classpath C:\Users\amanaf\IdeaProjects\testApp\target\classes;C:\Users\amanaf\.m2\repository\org\apache\spark\spark-core_2.10\1.6.1\spark-core_2.10-1.6.1.jar;C:\Users\amanaf\.m2\repository\org\apache\avro\avro-mapred\1.7.7\avro-mapred-1.7.7-hadoop2.jar;C:\Users\amanaf\.m2\repository\org\apache\avro\avro-ipc\1.7.7\avro-ipc-1.7.7.jar;C:\Users\amanaf\.m2\repository\org\apache\avro\avro\1.7.7\avro-1.7.7.jar;C:\Users\amanaf\.m2\repository\org\apache\avro\avro-ipc\1.7.7\avro-ipc-1.7.7-tests.jar;C:\Users\amanaf\.m2\repository\org\codehaus\jackson\jackson-core-asl\1.9.13\jackson-core-asl-1.9.13.jar;C:\Users\amanaf\.m2\repository\org\codehaus\jackson\jackson-mapper-asl\1.9.13\jackson-mapper-asl-1.9.13.jar;C:\Users\amanaf\.m2\repository\com\twitter\chill_2.10\0.5.0\chill_2.10-0.5.0.jar;C:\Users\amanaf\.m2\repository\com\esotericsoftware\kryo\kryo\2.21\kryo-2.21.jar;C:\Users\amanaf\.m2\repository\com\esotericsoftware\reflectasm\reflectasm\1.07\reflectasm-1.07-shaded.jar;C:\Users\amanaf\.m2\repository\com\esotericsoftware\minlog\minlog\1.2\minlog-1.2.jar;C:\Users\amanaf\.m2\repository\org\objenesis\objenesis\1.2\objenesis-1.2.jar;C:\Users\amanaf\.m2\repository\com\twitter\chill-java\0.5.0\chill-java-0.5.0.jar;C:\Users\amanaf\.m2\repository\org\apache\xbean\xbean-asm5-shaded\4.4\xbean-asm5-shaded-4.4.jar;C:\Users\amanaf\.m2\repository\org\apache\hadoop\hadoop-client\2.2.0\hadoop-client-2.2.0.jar;C:\Users\amanaf\.m2\repository\org\apache\hadoop\hadoop-common\2.2.0\hadoop-common-2.2.0.jar;C:\Users\amanaf\.m2\repository\commons-cli\commons-cli\1.2\commons-cli-1.2.jar;C:\Users\amanaf\.m2\repository\org\apache\commons\commons-math\2.1\commons-math-2.1.jar;C:\Users\amanaf\.m2\repository\xmlenc\xmlenc\0.52\xmlenc-0.52.jar;C:\Users\amanaf\.m2\repository\commons-configuration\commons-configuration\1.6\commons-configuration-1.6.jar;C:\Users\amanaf\.m2\repository\commons-collections\commons-collections\3.2.1\commons-collections-3.2.1.jar;C:\Users\amanaf\.m2\repository\commons-digester\commons-digester\1.8\commons-digester-1.8.jar;C:\Users\amanaf\.m2\repository\commons-beanutils\commons-beanutils\1.7.0\commons-beanutils-1.7.0.jar;C:\Users\amanaf\.m2\repository\commons-beanutils\commons-beanutils-core\1.8.0\commons-beanutils-core-1.8.0.jar;C:\Users\amanaf\.m2\repository\org\apache\hadoop\hadoop-auth\2.2.0\hadoop-auth-2.2.0.jar;C:\Users\amanaf\.m2\repository\org\apache\commons\commons-compress\1.4.1\commons-compress-1.4.1.jar;C:\Users\amanaf\.m2\repository\org\tukaani\xz\1.0\xz-1.0.jar;C:\Users\amanaf\.m2\repository\org\apache\hadoop\hadoop-hdfs\2.2.0\hadoop-hdfs-2.2.0.jar;C:\Users\amanaf\.m2\repository\org\mortbay\jetty\jetty-util\6.1.26\jetty-util-6.1.26.jar;C:\Users\amanaf\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-app\2.2.0\hadoop-mapreduce-client-app-2.2.0.jar;C:\Users\amanaf\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-common\2.2.0\hadoop-mapreduce-client-common-2.2.0.jar;C:\Users\amanaf\.m2\repository\org\apache\hadoop\hadoop-yarn-client\2.2.0\hadoop-yarn-client-2.2.0.jar;C:\Users\amanaf\.m2\repository\com\google\inject\guice\3.0\guice-3.0.jar;C:\Users\amanaf\.m2\repository\javax\inject\javax.inject\1\javax.inject-1.jar;C:\Users\amanaf\.m2\repository\aopalliance\aopalliance\1.0\aopalliance-1.0.jar;C:\Users\amanaf\.m2\repository\com\sun\jersey\jersey-test-framework\jersey-test-framework-grizzly2\1.9\jersey-test-framework-grizzly2-1.9.jar;C:\Users\amanaf\.m2\repository\com\sun\jersey\jersey-test-framework\jersey-test-framework-core\1.9\jersey-test-framework-core-1.9.jar;C:\Users\amanaf\.m2\repository\javax\servlet\javax.servlet-api\3.0.1\javax.servlet-api-3.0.1.jar;C:\Users\amanaf\.m2\repository\com\sun\jersey\jersey-client\1.9\jersey-client-1.9.jar;C:\Users\amanaf\.m2\repository\com\sun\jersey\jersey-grizzly2\1.9\jersey-grizzly2-1.9.jar;C:\Users\amanaf\.m2\repository\org\glassfish\grizzly\grizzly-http\2.1.2\grizzly-http-2.1.2.jar;C:\Users\amanaf\.m2\repository\org\glassfish\grizzly\grizzly-framework\2.1.2\grizzly-framework-2.1.2.jar;C:\Users\amanaf\.m2\repository\org\glassfish\gmbal\gmbal-api-only\3.0.0-b023\gmbal-api-only-3.0.0-b023.jar;C:\Users\amanaf\.m2\repository\org\glassfish\external\management-api\3.0.0-b012\management-api-3.0.0-b012.jar;C:\Users\amanaf\.m2\repository\org\glassfish\grizzly\grizzly-http-server\2.1.2\grizzly-http-server-2.1.2.jar;C:\Users\amanaf\.m2\repository\org\glassfish\grizzly\grizzly-rcm\2.1.2\grizzly-rcm-2.1.2.jar;C:\Users\amanaf\.m2\repository\org\glassfish\grizzly\grizzly-http-servlet\2.1.2\grizzly-http-servlet-2.1.2.jar;C:\Users\amanaf\.m2\repository\org\glassfish\javax.servlet\3.1\javax.servlet-3.1.jar;C:\Users\amanaf\.m2\repository\com\sun\jersey\jersey-json\1.9\jersey-json-1.9.jar;C:\Users\amanaf\.m2\repository\org\codehaus\jettison\jettison\1.1\jettison-1.1.jar;C:\Users\amanaf\.m2\repository\stax\stax-api\1.0.1\stax-api-1.0.1.jar;C:\Users\amanaf\.m2\repository\com\sun\xml\bind\jaxb-impl\2.2.3-1\jaxb-impl-2.2.3-1.jar;C:\Users\amanaf\.m2\repository\javax\xml\bind\jaxb-api\2.2.2\jaxb-api-2.2.2.jar;C:\Users\amanaf\.m2\repository\javax\activation\activation\1.1\activation-1.1.jar;C:\Users\amanaf\.m2\repository\org\codehaus\jackson\jackson-jaxrs\1.8.3\jackson-jaxrs-1.8.3.jar;C:\Users\amanaf\.m2\repository\org\codehaus\jackson\jackson-xc\1.8.3\jackson-xc-1.8.3.jar;C:\Users\amanaf\.m2\repository\com\sun\jersey\contribs\jersey-guice\1.9\jersey-guice-1.9.jar;C:\Users\amanaf\.m2\repository\org\apache\hadoop\hadoop-yarn-server-common\2.2.0\hadoop-yarn-server-common-2.2.0.jar;C:\Users\amanaf\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-shuffle\2.2.0\hadoop-mapreduce-client-shuffle-2.2.0.jar;C:\Users\amanaf\.m2\repository\org\apache\hadoop\hadoop-yarn-api\2.2.0\hadoop-yarn-api-2.2.0.jar;C:\Users\amanaf\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-core\2.2.0\hadoop-mapreduce-client-core-2.2.0.jar;C:\Users\amanaf\.m2\repository\org\apache\hadoop\hadoop-yarn-common\2.2.0\hadoop-yarn-common-2.2.0.jar;C:\Users\amanaf\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-jobclient\2.2.0\hadoop-mapreduce-client-jobclient-2.2.0.jar;C:\Users\amanaf\.m2\repository\org\apache\hadoop\hadoop-annotations\2.2.0\hadoop-annotations-2.2.0.jar;C:\Users\amanaf\.m2\repository\org\apache\spark\spark-launcher_2.10\1.6.1\spark-launcher_2.10-1.6.1.jar;C:\Users\amanaf\.m2\repository\org\apache\spark\spark-network-common_2.10\1.6.1\spark-network-common_2.10-1.6.1.jar;C:\Users\amanaf\.m2\repository\org\apache\spark\spark-network-shuffle_2.10\1.6.1\spark-network-shuffle_2.10-1.6.1.jar;C:\Users\amanaf\.m2\repository\org\fusesource\leveldbjni\leveldbjni-all\1.8\leveldbjni-all-1.8.jar;C:\Users\amanaf\.m2\repository\com\fasterxml\jackson\core\jackson-annotations\2.4.4\jackson-annotations-2.4.4.jar;C:\Users\amanaf\.m2\repository\org\apache\spark\spark-unsafe_2.10\1.6.1\spark-unsafe_2.10-1.6.1.jar;C:\Users\amanaf\.m2\repository\net\java\dev\jets3t\jets3t\0.7.1\jets3t-0.7.1.jar;C:\Users\amanaf\.m2\repository\commons-codec\commons-codec\1.3\commons-codec-1.3.jar;C:\Users\amanaf\.m2\repository\commons-httpclient\commons-httpclient\3.1\commons-httpclient-3.1.jar;C:\Users\amanaf\.m2\repository\org\apache\curator\curator-recipes\2.4.0\curator-recipes-2.4.0.jar;C:\Users\amanaf\.m2\repository\org\apache\curator\curator-framework\2.4.0\curator-framework-2.4.0.jar;C:\Users\amanaf\.m2\repository\org\apache\curator\curator-client\2.4.0\curator-client-2.4.0.jar;C:\Users\amanaf\.m2\repository\org\apache\zookeeper\zookeeper\3.4.5\zookeeper-3.4.5.jar;C:\Users\amanaf\.m2\repository\jline\jline\0.9.94\jline-0.9.94.jar;C:\Users\amanaf\.m2\repository\com\google\guava\guava\14.0.1\guava-14.0.1.jar;C:\Users\amanaf\.m2\repository\org\eclipse\jetty\orbit\javax.servlet\3.0.0.v201112011016\javax.servlet-3.0.0.v201112011016.jar;C:\Users\amanaf\.m2\repository\org\apache\commons\commons-lang3\3.3.2\commons-lang3-3.3.2.jar;C:\Users\amanaf\.m2\repository\org\apache\commons\commons-math3\3.4.1\commons-math3-3.4.1.jar;C:\Users\amanaf\.m2\repository\com\google\code\findbugs\jsr305\1.3.9\jsr305-1.3.9.jar;C:\Users\amanaf\.m2\repository\org\slf4j\slf4j-api\1.7.10\slf4j-api-1.7.10.jar;C:\Users\amanaf\.m2\repository\org\slf4j\jul-to-slf4j\1.7.10\jul-to-slf4j-1.7.10.jar;C:\Users\amanaf\.m2\repository\org\slf4j\jcl-over-slf4j\1.7.10\jcl-over-slf4j-1.7.10.jar;C:\Users\amanaf\.m2\repository\log4j\log4j\1.2.17\log4j-1.2.17.jar;C:\Users\amanaf\.m2\repository\org\slf4j\slf4j-log4j12\1.7.10\slf4j-log4j12-1.7.10.jar;C:\Users\amanaf\.m2\repository\com\ning\compress-lzf\1.0.3\compress-lzf-1.0.3.jar;C:\Users\amanaf\.m2\repository\org\xerial\snappy\snappy-java\1.1.2\snappy-java-1.1.2.jar;C:\Users\amanaf\.m2\repository\net\jpountz\lz4\lz4\1.3.0\lz4-1.3.0.jar;C:\Users\amanaf\.m2\repository\org\roaringbitmap\RoaringBitmap\0.5.11\RoaringBitmap-0.5.11.jar;C:\Users\amanaf\.m2\repository\commons-net\commons-net\2.2\commons-net-2.2.jar;C:\Users\amanaf\.m2\repository\com\typesafe\akka\akka-remote_2.10\2.3.11\akka-remote_2.10-2.3.11.jar;C:\Users\amanaf\.m2\repository\com\typesafe\akka\akka-actor_2.10\2.3.11\akka-actor_2.10-2.3.11.jar;C:\Users\amanaf\.m2\repository\com\typesafe\config\1.2.1\config-1.2.1.jar;C:\Users\amanaf\.m2\repository\io\netty\netty\3.8.0.Final\netty-3.8.0.Final.jar;C:\Users\amanaf\.m2\repository\com\google\protobuf\protobuf-java\2.5.0\protobuf-java-2.5.0.jar;C:\Users\amanaf\.m2\repository\org\uncommons\maths\uncommons-maths\1.2.2a\uncommons-maths-1.2.2a.jar;C:\Users\amanaf\.m2\repository\com\typesafe\akka\akka-slf4j_2.10\2.3.11\akka-slf4j_2.10-2.3.11.jar;C:\Users\amanaf\.m2\repository\org\scala-lang\scala-library\2.10.5\scala-library-2.10.5.jar;C:\Users\amanaf\.m2\repository\org\json4s\json4s-jackson_2.10\3.2.10\json4s-jackson_2.10-3.2.10.jar;C:\Users\amanaf\.m2\repository\org\json4s\json4s-core_2.10\3.2.10\json4s-core_2.10-3.2.10.jar;C:\Users\amanaf\.m2\repository\org\json4s\json4s-ast_2.10\3.2.10\json4s-ast_2.10-3.2.10.jar;C:\Users\amanaf\.m2\repository\org\scala-lang\scalap\2.10.0\scalap-2.10.0.jar;C:\Users\amanaf\.m2\repository\org\scala-lang\scala-compiler\2.10.0\scala-compiler-2.10.0.jar;C:\Users\amanaf\.m2\repository\com\sun\jersey\jersey-server\1.9\jersey-server-1.9.jar;C:\Users\amanaf\.m2\repository\asm\asm\3.1\asm-3.1.jar;C:\Users\amanaf\.m2\repository\com\sun\jersey\jersey-core\1.9\jersey-core-1.9.jar;C:\Users\amanaf\.m2\repository\org\apache\mesos\mesos\0.21.1\mesos-0.21.1-shaded-protobuf.jar;C:\Users\amanaf\.m2\repository\io\netty\netty-all\4.0.29.Final\netty-all-4.0.29.Final.jar;C:\Users\amanaf\.m2\repository\com\clearspring\analytics\stream\2.7.0\stream-2.7.0.jar;C:\Users\amanaf\.m2\repository\io\dropwizard\metrics\metrics-core\3.1.2\metrics-core-3.1.2.jar;C:\Users\amanaf\.m2\repository\io\dropwizard\metrics\metrics-jvm\3.1.2\metrics-jvm-3.1.2.jar;C:\Users\amanaf\.m2\repository\io\dropwizard\metrics\metrics-json\3.1.2\metrics-json-3.1.2.jar;C:\Users\amanaf\.m2\repository\io\dropwizard\metrics\metrics-graphite\3.1.2\metrics-graphite-3.1.2.jar;C:\Users\amanaf\.m2\repository\com\fasterxml\jackson\core\jackson-databind\2.4.4\jackson-databind-2.4.4.jar;C:\Users\amanaf\.m2\repository\com\fasterxml\jackson\core\jackson-core\2.4.4\jackson-core-2.4.4.jar;C:\Users\amanaf\.m2\repository\com\fasterxml\jackson\module\jackson-module-scala_2.10\2.4.4\jackson-module-scala_2.10-2.4.4.jar;C:\Users\amanaf\.m2\repository\org\scala-lang\scala-reflect\2.10.4\scala-reflect-2.10.4.jar;C:\Users\amanaf\.m2\repository\com\thoughtworks\paranamer\paranamer\2.6\paranamer-2.6.jar;C:\Users\amanaf\.m2\repository\org\apache\ivy\ivy\2.4.0\ivy-2.4.0.jar;C:\Users\amanaf\.m2\repository\oro\oro\2.0.8\oro-2.0.8.jar;C:\Users\amanaf\.m2\repository\org\tachyonproject\tachyon-client\0.8.2\tachyon-client-0.8.2.jar;C:\Users\amanaf\.m2\repository\commons-lang\commons-lang\2.4\commons-lang-2.4.jar;C:\Users\amanaf\.m2\repository\commons-io\commons-io\2.4\commons-io-2.4.jar;C:\Users\amanaf\.m2\repository\org\tachyonproject\tachyon-underfs-hdfs\0.8.2\tachyon-underfs-hdfs-0.8.2.jar;C:\Users\amanaf\.m2\repository\org\tachyonproject\tachyon-underfs-s3\0.8.2\tachyon-underfs-s3-0.8.2.jar;C:\Users\amanaf\.m2\repository\org\tachyonproject\tachyon-underfs-local\0.8.2\tachyon-underfs-local-0.8.2.jar;C:\Users\amanaf\.m2\repository\net\razorvine\pyrolite\4.9\pyrolite-4.9.jar;C:\Users\amanaf\.m2\repository\net\sf\py4j\py4j\0.9\py4j-0.9.jar;C:\Users\amanaf\.m2\repository\org\spark-project\spark\unused\1.0.0\unused-1.0.0.jar SparkTest
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/11/29 06:02:59 INFO SparkContext: Running Spark version 1.6.1
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/C:/Users/amanaf/.m2/repository/org/apache/hadoop/hadoop-auth/2.2.0/hadoop-auth-2.2.0.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
18/11/29 06:02:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/11/29 06:02:59 INFO SecurityManager: Changing view acls to: amanaf
18/11/29 06:02:59 INFO SecurityManager: Changing modify acls to: amanaf
18/11/29 06:02:59 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(amanaf); users with modify permissions: Set(amanaf)
18/11/29 06:03:00 INFO PlatformDependent: Your platform does not provide complete low-level API for accessing direct buffers reliably. Unless explicitly requested, heap buffer will always be preferred to avoid potential system unstability.
18/11/29 06:03:00 INFO Utils: Successfully started service 'sparkDriver' on port 55702.
18/11/29 06:03:00 INFO Slf4jLogger: Slf4jLogger started
18/11/29 06:03:00 INFO Remoting: Starting remoting
18/11/29 06:03:00 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@172.20.255.74:55715]
18/11/29 06:03:00 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 55715.
18/11/29 06:03:00 INFO SparkEnv: Registering MapOutputTracker
18/11/29 06:03:00 INFO SparkEnv: Registering BlockManagerMaster
18/11/29 06:03:00 INFO DiskBlockManager: Created local directory at C:\Users\amanaf\AppData\Local\Temp\blockmgr-183dfab1-dc04-401d-9b91-6caf7861709d
18/11/29 06:03:00 INFO MemoryStore: MemoryStore started with capacity 2.8 GB
18/11/29 06:03:00 INFO SparkEnv: Registering OutputCommitCoordinator
18/11/29 06:03:00 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/11/29 06:03:00 INFO SparkUI: Started SparkUI at http://172.20.255.74:4040
18/11/29 06:03:01 INFO Executor: Starting executor ID driver on host localhost
18/11/29 06:03:01 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 55752.
18/11/29 06:03:01 INFO NettyBlockTransferService: Server created on 55752
18/11/29 06:03:01 INFO BlockManagerMaster: Trying to register BlockManager
18/11/29 06:03:01 INFO BlockManagerMasterEndpoint: Registering block manager localhost:55752 with 2.8 GB RAM, BlockManagerId(driver, localhost, 55752)
18/11/29 06:03:01 INFO BlockManagerMaster: Registered BlockManager
18/11/29 06:03:01 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 73.9 KB, free 73.9 KB)
18/11/29 06:03:01 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 9.8 KB, free 83.7 KB)
18/11/29 06:03:01 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:55752 (size: 9.8 KB, free: 2.8 GB)
18/11/29 06:03:01 INFO SparkContext: Created broadcast 0 from textFile at SparkTest.java:12
Exception in thread "main" java.lang.ExceptionInInitializerError
    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76)
    at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:362)
    at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
    at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
    at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
    at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
    at scala.Option.map(Option.scala:145)
    at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:195)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
    at org.apache.spark.rdd.RDD.count(RDD.scala:1157)
    at org.apache.spark.api.java.JavaRDDLike$class.count(JavaRDDLike.scala:440)
    at org.apache.spark.api.java.AbstractJavaRDDLike.count(JavaRDDLike.scala:46)
    at SparkTest.main(SparkTest.java:13)
Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2
    at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319)
    at java.base/java.lang.String.substring(String.java:1874)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:48)
    ... 23 more
18/11/29 06:03:01 INFO SparkContext: Invoking stop() from shutdown hook
18/11/29 06:03:01 INFO SparkUI: Stopped Spark web UI at http://172.20.255.74:4040
18/11/29 06:03:01 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/11/29 06:03:01 INFO MemoryStore: MemoryStore cleared
18/11/29 06:03:01 INFO BlockManager: BlockManager stopped
18/11/29 06:03:01 INFO BlockManagerMaster: BlockManagerMaster stopped
18/11/29 06:03:01 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/11/29 06:03:01 INFO SparkContext: Successfully stopped SparkContext
18/11/29 06:03:01 INFO ShutdownHookManager: Shutdown hook called
18/11/29 06:03:01 INFO ShutdownHookManager: Deleting directory C:\Users\amanaf\AppData\Local\Temp\spark-38128353-d1ea-4f8e-9edb-62b97a6fa4b5
18/11/29 06:03:01 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.

Process finished with exit code 1

Is anything I missed in the code?

评论

The actual error in your log is :

Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2

This is because of the issue in hadoop-common library with Java 9 and above versions. For details of this error you can refer to https://issues.apache.org/jira/browse/HADOOP-14586.

This issue will be fixed in Spark 3.0.0 release. Please refer to https://issues.apache.org/jira/browse/SPARK-26134. So for now you can downgrade your Java version to Java 8.

I don't think anything will work on Java 11; there's a truckload of things needing to be done; the stack trace of that one looks like someting minor about splitting jvm.version fields

See HADOOP-15338 for the TODO list for hadoop libs; I don't know of the spark or even scala library ones.

Options

  1. Change the java version in the IDE
  2. come and help fix all the java 11 issues. You are very welcome to join in there

When you are using SPARK, you have to make sure you have all the right compatible software version. For e.g. Spark 2.4.0 compatible with Hadoop 2.7 or later. And from https://spark.apache.org/docs/latest/

"Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.4.0 uses Scala 2.11. You will need to use a compatible Scala version (2.11.x)."

Even though it says java 8+ its probably not probably tested.

受限制的 HTML

  • 允许的HTML标签:<a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • 自动断行和分段。
  • 网页和电子邮件地址自动转换为链接。

相关推荐
  • Caused by: java.lang.NullPointerException at org.apache.spark.sql.Dataset
    Below I provide my code. I iterate over the DataFrame prodRows and for each product_PK I find some matching sub-list of product_PKs from prodRows. numRecProducts = 10 var listOfProducts: Map[Long,Array[(Long, Int)]] = Map() prodRows.foreach{ row : Row => val product_PK = row.get(row.fieldIndex("product_PK")).toString.toLong val gender = row.get(row.fieldIndex("gender_PK")).toString val selection = prodRows.filter($"gender_PK" === gender || $"gender_PK" === "UNISEX").limit(numRecProducts).select($"product_PK") var productList: Array[(Long, Int)] = Array() if (!selection.rdd.isEmpty()) {
  • Windows上的spark-shell错误-如果不使用hadoop,可以忽略它吗?(spark-shell error on Windows - can it be ignored if not using hadoop?)
    问题 启动spark-shell时出现以下错误。 我将使用Spark在SQL Server中处理数据。 我可以忽略这些错误吗? java.io.IOException:无法在Hadoop二进制文件中找到可执行文件null \ bin \ winutils.exe。 java.lang.IllegalArgumentException:实例化“ org.apache.spark.sql.hive.HiveSessionState”时出错引起原因:java.lang.reflect.InvocationTargetException:java.lang.IllegalArgumentException:实例化“ org.apache.spark.sql.hive.HiveExternalCatalog”时出错: 引起原因:java.lang.IllegalArgumentException:实例化“ org.apache.spark.sql.hive.HiveExternalCatalog”时出错引起原因:java.lang.IllegalArgumentException:实例化“ org.apache.spark.sql.hive.HiveExternalCatalog”时出错引起原因:java.lang.reflect.InvocationTargetException:java
  • Windows上的spark-shell错误-如果不使用hadoop,可以忽略它吗?(spark-shell error on Windows - can it be ignored if not using hadoop?)
    问题 启动spark-shell时出现以下错误。 我将使用Spark在SQL Server中处理数据。 我可以忽略这些错误吗? java.io.IOException:无法在Hadoop二进制文件中找到可执行文件null \ bin \ winutils.exe。 java.lang.IllegalArgumentException:实例化“ org.apache.spark.sql.hive.HiveSessionState”时出错引起原因:java.lang.reflect.InvocationTargetException:java.lang.IllegalArgumentException:实例化“ org.apache.spark.sql.hive.HiveExternalCatalog”时出错: 引起原因:java.lang.IllegalArgumentException:实例化“ org.apache.spark.sql.hive.HiveExternalCatalog”时出错引起原因:java.lang.IllegalArgumentException:实例化“ org.apache.spark.sql.hive.HiveExternalCatalog”时出错引起原因:java.lang.reflect.InvocationTargetException:java
  • spark-shell error on Windows - can it be ignored if not using hadoop?
    I got the following error when starting the spark-shell. I'm going to use Spark to process data in SQL Server. Can I ignore the errors? java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState' Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog': Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql
  • 由:org.apache.spark.sql.Dataset处的java.lang.NullPointerException(Caused by: java.lang.NullPointerException at org.apache.spark.sql.Dataset)
    问题 在下面,我提供了我的代码。 我遍历了DataFrame prodRows ,对于每个product_PK我从prodRows找到了product_PK一些匹配子列表。 numRecProducts = 10 var listOfProducts: Map[Long,Array[(Long, Int)]] = Map() prodRows.foreach{ row : Row => val product_PK = row.get(row.fieldIndex("product_PK")).toString.toLong val gender = row.get(row.fieldIndex("gender_PK")).toString val selection = prodRows.filter($"gender_PK" === gender || $"gender_PK" === "UNISEX").limit(numRecProducts).select($"product_PK") var productList: Array[(Long, Int)] = Array() if (!selection.rdd.isEmpty()) { productList = selection.rdd.map(x => (x(0).toString.toLong,1))
  • Python worker failed to connect back
    I'm a newby with Spark and trying to complete a Spark tutorial: link to tutorial After installing it on local machine (Win10 64, Python 3, Spark 2.4.0) and setting all env variables (HADOOP_HOME, SPARK_HOME etc) I'm trying to run a simple Spark job via WordCount.py file: from pyspark import SparkContext, SparkConf if __name__ == "__main__": conf = SparkConf().setAppName("word count").setMaster("local[2]") sc = SparkContext(conf = conf) lines = sc.textFile("C:/Users/mjdbr/Documents/BigData/python-spark-tutorial/in/word_count.text") words = lines.flatMap(lambda line: line.split(" ")) wordCounts
  • How to know what is the reason for ClosedChannelExceptions with spark-shell in YARN client mode?
    I have been trying to run spark-shell in YARN client mode, but I am getting a lot of ClosedChannelException errors. I am using spark 2.0.0 build for Hadoop 2.6. Here are the exceptions : $ spark-2.0.0-bin-hadoop2.6/bin/spark-shell --master yarn --deploy-mode client Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). 16/09/13 14:12:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/09/13 14:12:38 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set
  • Spark on Hive with Thriftserver
    一、复制Hadoop配置文件(core-site.xml,hdfs-site.xml,hive-site.xml)到Spark的conf下 二、复制Hive的mysql-connector-java-5.1.40-bin.jar到Spark的jars中 三、 启动Spark下的Thriftserver服务 sbin/start-thriftserver.sh \ --master local[*] \ --hiveconf hive.server2.thrift.port=16000 四、使用Spark下的beeline链接 bin/beeline -u jdbc:hive2://hdp5:16000 -n hadoop bin/beeline -u jdbc:hive2://hdp5:16000 -n hadoop -p hadoop 五、SQL性能测试(Spark明显比Tez还要快两倍多) 六、可能遇到的问题 Logging initialized using configuration in jar:file:/home/hadoop/hdp/hive-2.3.2/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true Exception in thread "main" java.lang
  • Error:'java.lang.UnsupportedOperationException' for Pyspark pandas_udf documentation code
    I'm having trouble replicating the Spark code from the Pyspark documentation available here. For example, when I try the following code pertaining to Grouped Map: import numpy as np import pandas as pd from pyspark.sql.functions import pandas_udf, PandasUDFType from pyspark.sql import SparkSession spark.stop() spark = SparkSession.builder.appName("New_App_grouped_map").getOrCreate() spark.conf.set("spark.sql.execution.arrow.enabled", "true") df = spark.createDataFrame( [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)], ("id", "v")) @pandas_udf("id long, v double", PandasUDFType.GROUPED_MAP)
  • Why accesing DataFrame from UDF results in NullPointerException?
    I have a problem executing a Spark application. Source code: // Read table From HDFS val productInformation = spark.table("temp.temp_table1") val dict = spark.table("temp.temp_table2") // Custom UDF val countPositiveSimilarity = udf[Long, Seq[String], Seq[String]]((a, b) => dict.filter( (($"first".isin(a: _*) && $"second".isin(b: _*)) || ($"first".isin(b: _*) && $"second".isin(a: _*))) && $"similarity" > 0.7 ).count ) val result = productInformation.withColumn("positive_count", countPositiveSimilarity($"title", $"internal_category")) // Error occurs! result.show Error message: org.apache.spark
  • IDEA运行spark报java.lang.reflect.InvocationTargetException
    在IDEA运行Spark碰到报错 19/9/25 17:15:44 ERROR MetricsSystem: Sink class org.apache.spark.metrics.sink.MetricsServlet cannot be instantiated 19/9/25 17:15:44 ERROR SparkContext: Error initializing SparkContext. java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org
  • collect() or toPandas() on a large DataFrame in pyspark/EMR
    I have an EMR cluster of one machine "c3.8xlarge", after reading several resources, I understood that I have to allow decent amount of memory off-heap because I am using pyspark, so I have configured the cluster as follow: One executor: spark.executor.memory 6g spark.executor.cores 10 spark.yarn.executor.memoryOverhead 4096 Driver: spark.driver.memory 21g When I cache() the DataFrame it takes about 3.6GB of memory. Now when I call collect() or toPandas() on the DataFrame, the process crashes. I know that I am bringing a large amount of data into the driver, but I think that it is not that
  • Spark Streaming: Could not compute split, block not found
    I am trying to use Spark Streaming with Kafka (version 1.1.0) but the Spark job keeps crashing due to this error: 14/11/21 12:39:23 ERROR TaskSetManager: Task 3967.0:0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 3967.0:0 failed 4 times, most recent failure: Exception failure in TID 43518 on host ********: java.lang.Exception: Could not compute split, block input-0-1416573258200 not found at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017) at org.apache.spark.scheduler.DAGScheduler$$anonfun
  • 为什么从UDF访问DataFrame会导致NullPointerException?(Why accesing DataFrame from UDF results in NullPointerException?)
    问题 我在执行Spark应用程序时遇到问题。 源代码: // Read table From HDFS val productInformation = spark.table("temp.temp_table1") val dict = spark.table("temp.temp_table2") // Custom UDF val countPositiveSimilarity = udf[Long, Seq[String], Seq[String]]((a, b) => dict.filter( (($"first".isin(a: _*) && $"second".isin(b: _*)) || ($"first".isin(b: _*) && $"second".isin(a: _*))) && $"similarity" > 0.7 ).count ) val result = productInformation.withColumn("positive_count", countPositiveSimilarity($"title", $"internal_category")) // Error occurs! result.show 错误信息: org.apache.spark.SparkException: Job aborted due to stage
  • Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database
    I am trying to run SparkSQL : val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) But the error i m getting is below: ... 125 more Caused by: java.sql.SQLException: Another instance of Derby may have already booted the database /root/spark/bin/metastore_db. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown
  • pyspark / EMR中的大型DataFrame上的collect()或toPandas()(collect() or toPandas() on a large DataFrame in pyspark/EMR)
    问题 我拥有一台计算机“ c3.8xlarge”的EMR群集,在阅读了几种资源后,我了解到由于我使用pyspark,我必须允许大量的堆外内存,因此我将群集配置如下: 一位执行人: spark.executor.memory 6克 spark.executor.cores 10 spark.yarn.executor.memory开销4096 司机: spark.driver.memory 21克 当我对DataFrame进行cache() ,它将占用约3.6GB的内存。 现在,当我在DataFrame上调用collect()或toPandas()时,进程崩溃。 我知道我将大量数据带入驱动程序,但是我认为它不是那么大,并且我无法弄清崩溃的原因。 当我调用collect()或toPandas()此错误: Py4JJavaError: An error occurred while calling o181.collectToPython. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 6.0 failed 4 times, most recent failure: Lost task 5.3 in stage 6.0 (TID 110, ip-10-0-47
  • Strange spark ERROR on AWS EMR
    I have a really simple PySpark script that creates a dataframe from some parquet data on S3 and then call count() method and print out the number of records. I run the script on AWS EMR cluster and I'm seeing following strange WARN information: 17/12/04 14:20:26 WARN ServletHandler: javax.servlet.ServletException: java.util.NoSuchElementException: None.get at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489) at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
  • Spark SQL SaveMode.Overwrite, getting java.io.FileNotFoundException and requiring 'REFRESH TABLE tableName'
    For spark sql, how should we fetch data from one folder in HDFS, do some modifications, and save the updated data to the same folder in HDFS via Overwrite save mode without getting FileNotFoundException? import org.apache.spark.sql.{SparkSession,SaveMode} import org.apache.spark.SparkConf val sparkConf: SparkConf = new SparkConf() val sparkSession = SparkSession.builder.config(sparkConf).getOrCreate() val df = sparkSession.read.parquet("hdfs://xxx.xxx.xxx.xxx:xx/test/d=2017-03-20") val newDF = df.select("a","b","c") newDF.write.mode(SaveMode.Overwrite) .parquet("hdfs://xxx.xxx.xxx.xxx:xx/test
  • Running Spark driver program in Docker container - no connection back from executor to the driver?
    UPDATE: The problem is resolved. The Docker image is here: docker-spark-submit I run spark-submit with a fat jar inside a Docker container. My standalone Spark cluster runs on 3 virtual machines - one master and two workers. From an executor log on a worker machine, I see that the executor has the following driver URL: "--driver-url" "spark://CoarseGrainedScheduler@172.17.0.2:5001" 172.17.0.2 is actually the address of the container with the driver program, not the host machine where the container is running. This IP is not accessible from the worker machine, therefore the worker is not able
  • How to create a Spark UDF in Java / Kotlin which returns a complex type?
    I'm trying to write an UDF which returns a complex type: private val toPrice = UDF1<String, Map<String, String>> { s -> val elements = s.split(" ") mapOf("value" to elements[0], "currency" to elements[1]) } val type = DataTypes.createStructType(listOf( DataTypes.createStructField("value", DataTypes.StringType, false), DataTypes.createStructField("currency", DataTypes.StringType, false))) df.sqlContext().udf().register("toPrice", toPrice, type) but any time I use this: df = df.withColumn("price", callUDF("toPrice", col("price"))) I get a cryptic error: Caused by: org.apache.spark.SparkException