I've recently been doing some work with Hadoop using the Hortonworks distribution. Most recently I configured Knox to integrate with Active Directory. The end goal was to be able to authenticate with Active Directory via Knox (a REST API Gateway) and then on to other services like Hive. I also configured Knox to point to Zookeeper (HA service discovery) vs. Hive directly, but that's really more detail than we need for integrating Knox with AD.
The Knox documentation is really good and very helpful:
https://knox.apache.org/books/knox-0-9-0/user-guide.html
The first thing that was done was to test Knox by using the demo LDAP service.
From Ambari > Knox > Service Actions > Start Demo LDAP
I'm going to gloss over this because it's a generic test and pretty simple to figure out. One note, is that you're able to add users to the demo LDAP service via the "Advanced users-ldif" configuration listed in the Knox Configs section of Ambari. The default "guest" and "admin" accounts as well as their plain text passwords are in that config location as well.
A couple of status commands I found helpful for this:
/usr/hdp/current/knox-server/bin/gateway.sh status
/usr/hdp/current/knox-server/bin/ldap.sh status
(you can also start/stop)
Once the Knox service was verified, I proceeded to test configuration from Knox to Hive. In order to test this, the Hive Authentication (Ambari > Hive > Configs > HiveServer2 Authentication) was set to LDAP.
Once that tested successfully the next test was Knox to Hive via Zookeeper. Because I had previously enabled Kerberos in my cluster, I needed to change the Hive Authentication to use Kerberos. I had initially been using beeline to test jdbc connections to Hive, but with Knox you need to test from outside the Hadoop cluster. In order to achieve this goal I went with SQuirreL SQL Client on Windows. Any jdbc compliant client will work. I also setup the Hortonworks Hive ODBC Driver and tested from a linked server in SSMS, but I digress.
OK, so with Knox installed and communication to Hive verified, I proceeded to work through the following article to get Knox and AD working together:
https://cwiki.apache.org/confluence/display/KNOX/Using+Apache+Knox+with+ActiveDirectory
This documentation is very good and I recommend taking your time, reading it all, and saving each sample file as illustrated. This allows you to go back and reference prior settings.
The documentation repeatedly references ldapwhoami and ldapsearch. I'll admit I initially attempted to configure all of this without using these utilities and only relied on the Windows tools: Active Directory Users and Computers and ldp.exe. Please take my advice and install the linux clients:
yum install openldap-clients
ldapsearch provides some really useful details and I had fun working with it. Getting a better look at the AD internals that aren't easily determined from Active Directory Users and Computers was really helpful.
In spite of the documentation being really good, it does make mention that it assumes default locations and that you'll need to get, "correct values for your environment". By using the aforementioned tools and by running through several ldapsearch queries I was eventually able to determine the necessary values.
I wish I could walk you through what I did exactly, however it's a bit too specific to my environment to post on a blog. The specifics I can give you are that the tools helped me to determine the following values that ended up in my topology file:
Parameter | Description |
main.ldapRealm.contextFactory.url | The hostname where ActiveDirectory is running. |
main.ldapRealm.contextFactory.systemUsername | User running searches |
main.ldapRealm.contextFactory.systemPassword | The password for the system user |
main.ldapRealm.userSearchBase | subset of users to search for authentication |
main.ldapRealm.groupSearchBase | subset of groups to search for user membership |
One of the last things I did was to store the password for the systemPassword in a protected credential store. This is briefly referenced in the beginning of Part 2 in the Using Apache Knox with ActiveDirectory documentation. However, what is not mentioned is that there is a known bug that will result in your inability to test with the knoxcli once this is enabled:
https://issues.apache.org/jira/browse/KNOX-745
That's the reason I did this last. Connections will work and your systemUsername will be able to successfully run authentication quires against AD, but setting this prematurely will cause headaches with knoxcli testing. One other note that's not referenced anywhere, is that for whatever reason the formatting of the XML for the systemPassword parameter doesn't seem to work in the short form that's used in the documentation. That is to say...
This works:
- <param>
- <name>main.ldapRealm.contextFactory.systemPassword</name>
- <value>${ALIAS=ldcSystemPassword}</value>
- </param>
This does not:
- <param name="main.ldapRealm.contextFactory.systemPassword" value=${ALIAS=ldcSystemPassword}/>
First, get your private key:
openssl s_client -connect [Server]:[Port]
Second, import your private key into the JAVA_HOME keystore that Knox is using:
- Find the java home directory:
cat gateway.log | grep java.home - Import the private key:
keytool -importcert -noprompt -storepass [PW] -file [PrivKeyFile] -alias [ALIAS] -keystore [KEYSTORE]
Last, restart the Knox service from Ambari and test.
It is important to note that when Knox is restarted the details from the following location overwrites the default.xml topology file in the Knox {GATEWAY_HOME}/conf/topologies directory:
Ambari > Services > Knox > Configs > Advanced topology
That is to say you should copy and paste your final topology information into the Advanced topology section as part of the restart test.
Since I mentioned Zookeeper in the beginning, I'll also let you in on how that's configured in my topology file. The documentation walks you through multiple samples. Sample 7 leaves it to you to determine your service values. (I copied mine from the default topology). Any services that run through zookeeper needs to be defined according to the Hortonworks Community like so:
- <provider> <role>ha</role> <name>HaProvider</name> <enabled>true</enabled> <param /> <name>HIVE</name> <value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true;zookeeperEnsemble=machine1:2181,machine2:2181,machine3:2181; zookeeperNamespace=hiveserver2</value> </provider>