一、問題背景
Atlas 啓動報錯通常分兩階段:
- HBase 無權限(上文已有説明);
- Solr 401 Unauthorized(本篇重點)。
二、錯誤日誌分析
Atlas 啓動失敗時控制枱輸出如下堆棧信息:
.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1796)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:620)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542)
at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:336)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:334)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:209)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:955)
at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:932)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:591)
at org.springframework.web.context.ContextLoader.configureAndRefreshWebApplicationContext(ContextLoader.java:399)
at org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:278)
at org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:103)
at org.apache.atlas.web.setup.KerberosAwareListener.contextInitialized(KerberosAwareListener.java:31)
at org.eclipse.jetty.server.handler.ContextHandler.callContextInitialized(ContextHandler.java:1073)
at org.eclipse.jetty.servlet.ServletContextHandler.callContextInitialized(ServletContextHandler.java:572)
at org.eclipse.jetty.server.handler.ContextHandler.contextInitialized(ContextHandler.java:1002)
at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:765)
at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:379)
at org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1449)
at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1414)
at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:916)
at org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:288)
at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:524)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169)
at org.eclipse.jetty.server.Server.start(Server.java:423)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:110)
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:97)
at org.eclipse.jetty.server.Server.doStart(Server.java:387)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73)
at org.apache.atlas.web.service.EmbeddedServer.start(EmbeddedServer.java:111)
at org.apache.atlas.Atlas.main(Atlas.java:133)
Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Could not successfully complete backend operation due to repeated temporary exceptions after PT1M40S
at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:98)
at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:52)
... 60 common frames omitted
Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Unable to complete query on Solr.
at org.janusgraph.diskstorage.solr.SolrIndex.storageException(SolrIndex.java:1312)
at org.janusgraph.diskstorage.solr.SolrIndex.mutate(SolrIndex.java:487)
at org.janusgraph.diskstorage.indexing.IndexTransaction$1.call(IndexTransaction.java:151)
at org.janusgraph.diskstorage.indexing.IndexTransaction$1.call(IndexTransaction.java:148)
at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:66)
... 61 common frames omitted
Caused by: org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error from server at http://dev1:8983/solr/vertex_index_shard1_replica_n1: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 401 Unauthorized access</title>
</head>
<body><h2>HTTP ERROR 401 Unauthorized access</h2>
<table>
<tr><th>URI:</th><td>/solr/vertex_index_shard1_replica_n1/update</td></tr>
<tr><th>STATUS:</th><td>401</td></tr>
<tr><th>MESSAGE:</th><td>Unauthorized access</td></tr>
<tr><th>SERVLET:</th><td>default</td></tr>
</table>
</body>
</html>
at org.apache.solr.client.solrj.impl.CloudSolrClient.getRouteException(CloudSolrClient.java:125)
at org.apache.solr.client.solrj.impl.CloudSolrClient.getRouteException(CloudSolrClient.java:46)
at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.directUpdate(BaseCloudSolrClient.java:579)
at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1076)
at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:934)
at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:866)
at org.janusgraph.diskstorage.solr.SolrIndex.commitChanges(SolrIndex.java:609)
at org.janusgraph.diskstorage.solr.SolrIndex.mutate(SolrIndex.java:482)
... 64 common frames omitted
Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://dev1:8983/solr/vertex_index_shard1_replica_n1: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 401 Unauthorized access</title>
</head>
<body><h2>HTTP ERROR 401 Unauthorized access</h2>
<table>
<tr><th>URI:</th><td>/solr/vertex_index_shard1_replica_n1/update</td></tr>
<tr><th>STATUS:</th><td>401</td></tr>
<tr><th>MESSAGE:</th><td>Unauthorized access</td></tr>
<tr><th>SERVLET:</th><td>default</td></tr>
</table>
</body>
</html>
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:635)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
at org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:369)
at org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:297)
at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.lambda$directUpdate$0(BaseCloudSolrClient.java:555)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[root@dev1 atlas]#
可以看到:
Atlas 已成功啓動 Jetty,但在初始化 Solr 索引時被拒絕訪問。
這通常是因為 Solr 啓用了 Kerberos 認證,而 Atlas 端未正確配置 Kerberos HTTP Client。
三、問題定位
Atlas 與 Solr 交互依賴 solrj 客户端。 不同版本 Solr 使用的 Kerberos 客户端工廠類不同。若配置不匹配,就會出現 401 錯誤。
為了確認問題,我們檢查相關 JAR 包內是否包含 Kerberos 客户端類。
四、版本驗證與差異確認
執行以下命令:
jar tf /usr/bigtop/current/atlas-server/server/webapp/atlas/WEB-INF/lib/janusgraph-solr-1.0.0.jar | \
grep -F 'org/apache/solr/client/solrj/impl/Krb5HttpClientConfigurer.class' \
|| echo 'NOT_FOUND'
jar tf /usr/bigtop/current/atlas-server/server/webapp/atlas/WEB-INF/lib/solr-core-8.11.3.jar | \
grep -F 'org/apache/solr/client/solrj/impl/Krb5HttpClientConfigurer.class' \
|| echo 'NOT_FOUND'
jar tf /usr/bigtop/current/atlas-server/server/webapp/atlas/WEB-INF/lib/solr-core-8.11.3.jar | \
grep -F 'org/apache/solr/client/solrj/impl/Krb5HttpClientConfigurer.class' \
|| echo 'NOT_FOUND'
結果均為:
NOT_FOUND
表示當前 Atlas 使用的 janusgraph-solr 與 solr-core 中,已無舊版 Krb5HttpClientConfigurer 類。
接着驗證 solr-solrj-8.11.3.jar:
jar tf /usr/bigtop/current/atlas-server/server/webapp/atlas/WEB-INF/lib/solr-solrj-8.11.3.jar \
| egrep 'Krb5HttpClientBuilder|Krb5HttpClientConfigurer'
輸出如下:
[root@dev1 ~]# jar tf /usr/bigtop/current/atlas-server/server/webapp/atlas/WEB-INF/lib/solr-solrj-8.11.3.jar \
> | egrep 'Krb5HttpClientBuilder|Krb5HttpClientConfigurer'
org/apache/solr/client/solrj/impl/Krb5HttpClientBuilder$1.class
org/apache/solr/client/solrj/impl/Krb5HttpClientBuilder$2.class
org/apache/solr/client/solrj/impl/Krb5HttpClientBuilder$SolrJaasConfiguration.class
org/apache/solr/client/solrj/impl/Krb5HttpClientBuilder.class
[root@dev1 ~]#
結論 Solr 8.x 已棄用舊類
Krb5HttpClientConfigurer,改用Krb5HttpClientBuilder。 Atlas 默認模板未更新,導致 HTTP 客户端無法加載 Kerberos 認證模塊,從而返回 401。
五、解決方案
當開啓 Kerberos 後,Atlas 啓動可能因 Solr 認證失敗 報 401。相關代碼已經開源
https://github.com/TtBigdata/ambari-env
1、修改 atlas-env.sh 模板
找到 Ambari 模板:
/usr/bigtop/current/atlas-server/conf/atlas-env.sh
在啓用 Kerberos 區間中,加入 Solr HTTPClient 參數(對應上圖修改):
{% if security_enabled %}
export ATLAS_OPTS="{{metadata_opts}} \
-Dzookeeper.sasl.client.username={{zk_principal_user}} \
-Djava.security.auth.login.config={{atlas_jaas_file}} \
-Dsolr.kerberos.jaas.appname=Client \
-Djavax.security.auth.useSubjectCredsOnly=false \
-Dsolr.httpclient.builder.factory=org.apache.solr.client.solrj.impl.Krb5HttpClientBuilder \
-Dsun.security.krb5.debug=true"
{% else %}
export ATLAS_OPTS="{{metadata_opts}}"
{% endif %}
這四個關鍵參數讓 Atlas 正確啓用 Kerberos 認證的 Solr HTTP 客户端。
|
參數
|
作用
|
|
|
指定 JAAS 模塊名稱
|
|
|
允許 JVM 主動使用 Kerberos 憑據
|
|
|
指定新客户端類 Krb5HttpClientBuilder
|
|
|
打印調試日誌
|
2、確認 Solr 版本特性
我們使用的 Solr 為 8.11.3。 在此版本中,只有 solr-solrj-8.11.3.jar 包含 Kerberos 構建器類:
而舊路徑下的 janusgraph-solr 與 solr-core 不再包含 Kerberos 支持,因此必須通過上面的環境變量加載新類。
3、如 Solr 版本不一致
若 Solr 與 Atlas 所用包版本不匹配,或仍出現連接錯誤,可同時補充自定義配置:
在 Custom application-properties 中加入以下內容:
atlas.graph.index.search.solr.kerberos.principal=atlas/dev1@TTBIGDATA.COM
atlas.graph.index.search.solr.kerberos.keytab=/etc/security/keytabs/atlas.service.keytab
atlas.graph.index.search.solr.kerberos.enable=true
這能保證即使 HTTPClient 工廠未加載,也能強制啓用 Solr 層的 Kerberos 驗證。
如上圖所示修改 Custom application-properties