一、問題背景

Atlas 啓動報錯通常分兩階段:

  1. HBase 無權限(上文已有説明);
  2. Solr 401 Unauthorized(本篇重點)。

二、錯誤日誌分析

Atlas 啓動失敗時控制枱輸出如下堆棧信息:

.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1796)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:620)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542)
        at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:336)
        at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234)
        at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:334)
        at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:209)
        at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:955)
        at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:932)
        at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:591)
        at org.springframework.web.context.ContextLoader.configureAndRefreshWebApplicationContext(ContextLoader.java:399)
        at org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:278)
        at org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:103)
        at org.apache.atlas.web.setup.KerberosAwareListener.contextInitialized(KerberosAwareListener.java:31)
        at org.eclipse.jetty.server.handler.ContextHandler.callContextInitialized(ContextHandler.java:1073)
        at org.eclipse.jetty.servlet.ServletContextHandler.callContextInitialized(ServletContextHandler.java:572)
        at org.eclipse.jetty.server.handler.ContextHandler.contextInitialized(ContextHandler.java:1002)
        at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:765)
        at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:379)
        at org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1449)
        at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1414)
        at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:916)
        at org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:288)
        at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:524)
        at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73)
        at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169)
        at org.eclipse.jetty.server.Server.start(Server.java:423)
        at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:110)
        at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:97)
        at org.eclipse.jetty.server.Server.doStart(Server.java:387)
        at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73)
        at org.apache.atlas.web.service.EmbeddedServer.start(EmbeddedServer.java:111)
        at org.apache.atlas.Atlas.main(Atlas.java:133)
Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Could not successfully complete backend operation due to repeated temporary exceptions after PT1M40S
        at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:98)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:52)
        ... 60 common frames omitted
Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Unable to complete query on Solr.
        at org.janusgraph.diskstorage.solr.SolrIndex.storageException(SolrIndex.java:1312)
        at org.janusgraph.diskstorage.solr.SolrIndex.mutate(SolrIndex.java:487)
        at org.janusgraph.diskstorage.indexing.IndexTransaction$1.call(IndexTransaction.java:151)
        at org.janusgraph.diskstorage.indexing.IndexTransaction$1.call(IndexTransaction.java:148)
        at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:66)
        ... 61 common frames omitted
Caused by: org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error from server at http://dev1:8983/solr/vertex_index_shard1_replica_n1: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 401 Unauthorized access</title>
</head>
<body><h2>HTTP ERROR 401 Unauthorized access</h2>
<table>
<tr><th>URI:</th><td>/solr/vertex_index_shard1_replica_n1/update</td></tr>
<tr><th>STATUS:</th><td>401</td></tr>
<tr><th>MESSAGE:</th><td>Unauthorized access</td></tr>
<tr><th>SERVLET:</th><td>default</td></tr>
</table>

</body>
</html>

        at org.apache.solr.client.solrj.impl.CloudSolrClient.getRouteException(CloudSolrClient.java:125)
        at org.apache.solr.client.solrj.impl.CloudSolrClient.getRouteException(CloudSolrClient.java:46)
        at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.directUpdate(BaseCloudSolrClient.java:579)
        at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1076)
        at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:934)
        at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:866)
        at org.janusgraph.diskstorage.solr.SolrIndex.commitChanges(SolrIndex.java:609)
        at org.janusgraph.diskstorage.solr.SolrIndex.mutate(SolrIndex.java:482)
        ... 64 common frames omitted
Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://dev1:8983/solr/vertex_index_shard1_replica_n1: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 401 Unauthorized access</title>
</head>
<body><h2>HTTP ERROR 401 Unauthorized access</h2>
<table>
<tr><th>URI:</th><td>/solr/vertex_index_shard1_replica_n1/update</td></tr>
<tr><th>STATUS:</th><td>401</td></tr>
<tr><th>MESSAGE:</th><td>Unauthorized access</td></tr>
<tr><th>SERVLET:</th><td>default</td></tr>
</table>

</body>
</html>

        at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:635)
        at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266)
        at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
        at org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:369)
        at org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:297)
        at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.lambda$directUpdate$0(BaseCloudSolrClient.java:555)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
[root@dev1 atlas]#

可以看到:

Atlas 已成功啓動 Jetty,但在初始化 Solr 索引時被拒絕訪問。

這通常是因為 Solr 啓用了 Kerberos 認證,而 Atlas 端未正確配置 Kerberos HTTP Client

【Ambari開啓Kerberos】- Atlas啓動 - Solr權限異常_Kerberos

三、問題定位

Atlas 與 Solr 交互依賴 solrj 客户端。 不同版本 Solr 使用的 Kerberos 客户端工廠類不同。若配置不匹配,就會出現 401 錯誤。

為了確認問題,我們檢查相關 JAR 包內是否包含 Kerberos 客户端類。

四、版本驗證與差異確認

執行以下命令:

jar tf /usr/bigtop/current/atlas-server/server/webapp/atlas/WEB-INF/lib/janusgraph-solr-1.0.0.jar | \
grep -F 'org/apache/solr/client/solrj/impl/Krb5HttpClientConfigurer.class' \
|| echo 'NOT_FOUND'

jar tf /usr/bigtop/current/atlas-server/server/webapp/atlas/WEB-INF/lib/solr-core-8.11.3.jar  | \
grep -F 'org/apache/solr/client/solrj/impl/Krb5HttpClientConfigurer.class' \
|| echo 'NOT_FOUND'
jar tf /usr/bigtop/current/atlas-server/server/webapp/atlas/WEB-INF/lib/solr-core-8.11.3.jar  | \
grep -F 'org/apache/solr/client/solrj/impl/Krb5HttpClientConfigurer.class' \
|| echo 'NOT_FOUND'

結果均為:

NOT_FOUND

【Ambari開啓Kerberos】- Atlas啓動 - Solr權限異常_運維_02

表示當前 Atlas 使用的 janusgraph-solr 與 solr-core 中,已無舊版 Krb5HttpClientConfigurer 類。

接着驗證 solr-solrj-8.11.3.jar

jar tf /usr/bigtop/current/atlas-server/server/webapp/atlas/WEB-INF/lib/solr-solrj-8.11.3.jar \
 | egrep 'Krb5HttpClientBuilder|Krb5HttpClientConfigurer'

輸出如下:

[root@dev1 ~]# jar tf /usr/bigtop/current/atlas-server/server/webapp/atlas/WEB-INF/lib/solr-solrj-8.11.3.jar \
>  | egrep 'Krb5HttpClientBuilder|Krb5HttpClientConfigurer'
org/apache/solr/client/solrj/impl/Krb5HttpClientBuilder$1.class
org/apache/solr/client/solrj/impl/Krb5HttpClientBuilder$2.class
org/apache/solr/client/solrj/impl/Krb5HttpClientBuilder$SolrJaasConfiguration.class
org/apache/solr/client/solrj/impl/Krb5HttpClientBuilder.class
[root@dev1 ~]#

【Ambari開啓Kerberos】- Atlas啓動 - Solr權限異常_大數據_03

結論 Solr 8.x 已棄用舊類 Krb5HttpClientConfigurer,改用 Krb5HttpClientBuilder。 Atlas 默認模板未更新,導致 HTTP 客户端無法加載 Kerberos 認證模塊,從而返回 401。

五、解決方案

當開啓 Kerberos 後,Atlas 啓動可能因 Solr 認證失敗 報 401。相關代碼已經開源


https://github.com/TtBigdata/ambari-env

1、修改 atlas-env.sh 模板

找到 Ambari 模板:

/usr/bigtop/current/atlas-server/conf/atlas-env.sh

在啓用 Kerberos 區間中,加入 Solr HTTPClient 參數(對應上圖修改):

【Ambari開啓Kerberos】- Atlas啓動 - Solr權限異常_運維_04

{% if security_enabled %}
export ATLAS_OPTS="{{metadata_opts}} \
-Dzookeeper.sasl.client.username={{zk_principal_user}} \
-Djava.security.auth.login.config={{atlas_jaas_file}} \
-Dsolr.kerberos.jaas.appname=Client \
-Djavax.security.auth.useSubjectCredsOnly=false \
-Dsolr.httpclient.builder.factory=org.apache.solr.client.solrj.impl.Krb5HttpClientBuilder \
-Dsun.security.krb5.debug=true"
{% else %}
export ATLAS_OPTS="{{metadata_opts}}"
{% endif %}

這四個關鍵參數讓 Atlas 正確啓用 Kerberos 認證的 Solr HTTP 客户端。

參數

作用

solr.kerberos.jaas.appname

指定 JAAS 模塊名稱

useSubjectCredsOnly=false

允許 JVM 主動使用 Kerberos 憑據

solr.httpclient.builder.factory

指定新客户端類 Krb5HttpClientBuilder

sun.security.krb5.debug

打印調試日誌

2、確認 Solr 版本特性

我們使用的 Solr 為 8.11.3。 在此版本中,只有 solr-solrj-8.11.3.jar 包含 Kerberos 構建器類:

【Ambari開啓Kerberos】- Atlas啓動 - Solr權限異常_大數據_05

而舊路徑下的 janusgraph-solrsolr-core 不再包含 Kerberos 支持,因此必須通過上面的環境變量加載新類。

3、如 Solr 版本不一致

若 Solr 與 Atlas 所用包版本不匹配,或仍出現連接錯誤,可同時補充自定義配置:

【Ambari開啓Kerberos】- Atlas啓動 - Solr權限異常_Ambari_06

Custom application-properties 中加入以下內容:

atlas.graph.index.search.solr.kerberos.principal=atlas/dev1@TTBIGDATA.COM
atlas.graph.index.search.solr.kerberos.keytab=/etc/security/keytabs/atlas.service.keytab
atlas.graph.index.search.solr.kerberos.enable=true

這能保證即使 HTTPClient 工廠未加載,也能強制啓用 Solr 層的 Kerberos 驗證。

如上圖所示修改 Custom application-properties