Saturday, July 27, 2019

General Errors after Hive installation

We may get couple of errors when we try to start hive via bin/hive command. The followings are the errors and corresponding fixes:

Error #1:
Exception in thread "main" java.lang.RuntimeException: Couldn't create directory ${system:java.io.tmpdir}/${hive.session.id}_resources

Fix #1: edit hive-site.xml:
  <property>
    <name>hive.downloaded.resources.dir</name>
    <!--
    <value>${system:java.io.tmpdir}/${hive.session.id}_resources</value>
    -->
    <value>/home/hduser/hive/tmp/${hive.session.id}_resources</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>

Error #2:
java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
Fix #2: replace ${system:java.io.tmpdir}/${system:user.name} by /tmp/mydir in hive-site.xml (see Confluence - AdminManual Configuration):
  <property>
    <name>hive.exec.local.scratchdir</name>
    <!--
    <value>${system:java.io.tmpdir}/${system:user.name}</value>
    -->
    <value>/tmp/mydir</value>
    <description>Local scratch space for Hive jobs</description>
  </property>

Now that we fixed the errors, let's start Hive CLI:

hduser@laptop:/usr/local/apache-hive-2.1.0-bin/bin$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/local/apache-hive-2.1.0-bin/lib/hive-common-2.1.0.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

To display all the tables:

hive> show tables;
OK
Time taken: 4.603 seconds

We can exit from that Hive shell by using exit command:

hive> exit;
hduser@laptop:/usr/local/apache-hive-2.1.0-bin/bin$ 

Caused by: java.net.URISyntaxException: while starting Hive

Error Code:

Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D


Solution:
Put the following at the beginning of hive-site.xml
  <property>
    <name>system:java.io.tmpdir</name>
    <value>/tmp/hive/java</value>
  </property>
  <property>
    <name>system:user.name</name>
    <value>${user.name}</value>
  </property>

Thursday, July 25, 2019

Install Hive on windows: 'hive' is not recognized as an internal or external command, operable program or batch file


1
If someone is still going through this problem; here's what i did to solve hive installation on windows.
My configurations are as below (latest as of date): I am using Windows 10
  • Hadoop 2.9.1
  • derby 10.14
  • hive 2.3.4 (my hive version does not contain bin/hive.cmd; the necessary file to run hive on windows)
@wheeler above mentioned that Hive is for Linux. Here's the hack to make it work for windows. My Hive installation version did not come with windows executable files. Hence the hack!
STEP 1
There are 3 files which you need to specifically download from *https://svn.apache.org/repos/
  1. https://svn.apache.org/repos/asf/hive/trunk/bin/hive.cmd save it in your %HIVE_HOME%/bin/ as hive.cmd
  2. https://svn.apache.org/repos/asf/hive/trunk/bin/ext/cli.cmd save it in your %HIVE_HOME%/bin/ext/ as cli.cmd
  3. https://svn.apache.org/repos/asf/hive/trunk/bin/ext/util/execHiveCmd.cmd save it in your %HIVE_HOME%/bin/ext/util/ as execHiveCmd.cmd*
where %HIVE_HOME% is where Hive is installed.
STEP 2
Create tmp dir under your HIVE_HOME (on local machine and not on HDFS) give 777 permissions to this tmp dir
STEP 3
Open your conf/hive-default.xml.template save it as conf/hive-site.xml Then in this hive-site.xml, paste below properties at the top under
<property>
    <name>system:java.io.tmpdir</name>
    <value>{PUT YOUR HIVE HOME DIR PATH HERE}/tmp</value> 
    <!-- MY PATH WAS C:/BigData/hive/tmp -->
</property>
<property>
    <name>system:user.name</name>
    <value>${user.name}</value>
</property>
(check the indents)
STEP 4 - Run Hadoop services
start-dfs
start-yarn
  • Run derby
StartNetworkServer -h 0.0.0.0
Make sure you have all above services running - go to cmd for HIVE_HOME/bin and run hive command
hive

Friday, July 19, 2019

Regex:Finding Patterns in Strings

Problem

You need to determine whether a String contains a regular expression pattern.

Solution

Create a Regex object by invoking the .r method on a String, and then use that pattern with findFirstIn when you’re looking for one match, and findAllIn when looking for all matches.
To demonstrate this, first create a Regex for the pattern you want to search for, in this case, a sequence of one or more numeric characters:
scala> val numPattern = "[0-9]+".r
numPattern: scala.util.matching.Regex = [0-9]+
Next, create a sample String you can search:
scala> val address = "123 Main Street Suite 101"
address: java.lang.String = 123 Main Street Suite 101
The findFirstIn method finds the first match:
scala> val match1 = numPattern.findFirstIn(address)
match1: Option[String] = Some(123)
(Notice that this method returns an Option[String]. I’ll dig into that in the Discussion.)
When looking for multiple matches, use the findAllIn method:
scala> val matches = numPattern.findAllIn(address)
matches: scala.util.matching.Regex.MatchIterator = non-empty iterator
As you can see, findAllIn returns an iterator, which lets you loop over the results:
scala> matches.foreach(println)
123
101
If findAllIn doesn’t find any results, an empty iterator is returned, so you can still write your code just like that—you don’t need to check to see if the result isnull. If you’d rather have the results as an Array, add the toArray method after the findAllIn call:
scala> val matches = numPattern.findAllIn(address).toArray
matches: Array[String] = Array(123, 101)
If there are no matches, this approach yields an empty Array. Other methods like toListtoSeq, and toVector are also available.

Discussion

Using the .r method on a String is the easiest way to create a Regex object. Another approach is to import the Regex class, create a Regex instance, and then use the instance in the same way:
scala> import scala.util.matching.Regex
import scala.util.matching.Regex

scala> val numPattern = new Regex("[0-9]+")
numPattern: scala.util.matching.Regex = [0-9]+

scala> val address = "123 Main Street Suite 101"
address: java.lang.String = 123 Main Street Suite 101

scala> val match1 = numPattern.findFirstIn(address)
match1: Option[String] = Some(123)
Although this is a bit more work, it’s also more obvious. I’ve found that it can be easy to overlook the .r at the end of a String (and then spend a few minutes wondering how the code I saw could possibly work).

Handling the Option returned by findFirstIn

As mentioned in the Solution, the findFirstIn method finds the first match in the String and returns an Option[String]:
scala> val match1 = numPattern.findFirstIn(address)
match1: Option[String] = Some(123)
The Option/Some/None pattern is discussed in detail in Recipe 20.6, but the simple way to think about an Option is that it’s a container that holds either zero or one values. In the case of findFirstIn, if it succeeds, it returns the string “123” as a Some(123), as shown in this example. However, if it fails to find the pattern in the string it’s searching, it will return a None, as shown here:
scala> val address = "No address given"
address: String = No address given

scala> val match1 = numPattern.findFirstIn(address)
match1: Option[String] = None
To summarize, a method defined to return an Option[String] will either return a Some(String), or a None.
The normal way to work with an Option is to use one of these approaches:
  • Call getOrElse on the value.
  • Use the Option in a match expression.
  • Use the Option in a foreach loop.
Recipe 20.6 describes those approaches in detail, but they’re demonstrated here for your convenience.
With the getOrElse approach, you attempt to “get” the result, while also specifying a default value that should be used if the method failed:
scala> val result = numPattern.findFirstIn(address).getOrElse("no match")
result: String = 123
Because an Option is a collection of zero or one elements, an experienced Scala developer will also use a foreach loop in this situation:
numPattern.findFirstIn(address).foreach { e =>
  // perform the next step in your algorithm,
  // operating on the value 'e'
}
A match expression also provides a very readable solution to the problem:
match1 match {
  case Some(s) => println(s"Found: $s")
  case None =>
}
See Recipe 20.6 for more information.
To summarize this approach, the following REPL example shows the complete process of creating a Regex, searching a String with findFirstIn, and then using a foreach loop on the resulting match:
scala> val numPattern = "[0-9]+".r
numPattern: scala.util.matching.Regex = [0-9]+

scala> val address = "123 Main Street Suite 101"
address: String = 123 Main Street Suite 101

scala> val match1 = numPattern.findFirstIn(address)
match1: Option[String] = Some(123)

scala> match1.foreach { e =>
     |   println(s"Found a match: $e")
     | }
Found a match: 123