意见箱
恒创运营部门将仔细参阅您的意见和建议,必要时将通过预留邮箱与您保持联络。感谢您的支持!
意见/建议
提交建议

TiSpark v2.4.x 升级到 TiSpark v2.5.x

来源:恒创科技 编辑:恒创科技编辑部
2022-08-12 16:06:52

作者: 边城元元 ​



TiSpark v2.4.x 升级到 TiSpark v2.5.x

一、背景

在安装 TiDB v6.0的时候,使用 Tiup 扩容的方式安装TiSpark集群,最高的版本是 TiSpark v2.4.1,没有最新的 Release TiSpark v2.5.1 。另外,TiSpark v2.5.0 及以上版本实现了部分鉴权与授权功能。

本次主要是体验

TiSpark v2.4.1 升级到 TiSpark v2.5.1体验 TiSpark v2.5.1 的鉴权和授权功能


二、准备环境

2.1 安装 Cluster111 (V6.0.0)


2.1.1 Cluster111 拓扑
# cluster111.yml
server_configs:
tidb:
log.slow-threshold: 300
binlog.enable: false
binlog.ignore-error: false
tikv:
readpool.storage.use-unified-pool: false
readpool.coprocessor.use-unified-pool: true
pd:
schedule.leader-schedule-limit: 4
schedule.region-schedule-limit: 2048
schedule.replica-schedule-limit: 64
replication.location-labels:
- host

pd_servers:
- host: 10.0.2.15
# ssh_port: 22
# name: "pd-1"
client_port: 2379
# peer_port: 2380


tidb_servers:
- host: 10.0.2.15

tikv_servers:
- host: 10.0.2.15
# ssh_port: 22
port: 20160
status_port: 20180
config:
server.grpc-concurrency: 4
monitoring_servers:
- host: 10.0.2.15

grafana_servers:
- host: 10.0.2.15

alertmanager_servers:
- host: 10.0.2.15


2.1.2 安装 Cluster111
# 安装tiup
curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | sh
source /root/.bash_profile

tiup update cluster
tiup cluster list

# 检测环境配置并尝试修正
tiup cluster check ./cluster111.yml --user root -p --apply
# 安装cluster111
tiup cluster deploy cluster111 v6.0.0 ./cluster111.yml --user root -p
# 启动集群
tiup cluster start cluster111
tiup cluster display cluster111

2.2 TiSpark v2.4.1

2.2.1 拓扑
# cluster111-v6.0.0-tispark.yml
tispark_masters:
- host: 10.0.2.15
ssh_port: 22
port: 7077
# NOTE: multiple worker nodes on the same host is not supported by Spark
tispark_workers:
- host: 10.0.2.15

2.2.2 安装 TiSpark安装openjdk8 (略)扩容的方式安装 TiSpark
tiup cluster scale-out cluster111 ./cluster111-v6.0.0-tispark.yml -uroot -p

TiSpark v2.4.x 升级到 TiSpark v2.5.x_spark

2.3 测试 Spark v2.4.3 Standalonespark-defaults.conf 中增加配置
# sql扩展类
spark.sql.extensions org.apache.spark.sql.TiExtensions
# master节点
spark.master spark://10.0.2.15:7077
# pd节点 多个pd用逗号隔开 如:10.16.20.1:2379,10.16.20.2:2379,10.16.20.3:2379
spark.tispark.pd.addresses 10.0.2.15:2379
启动 Spark 集群

​/tidb-deploy/tispark-master-7077/sbin/start-all.sh​

启动Spark-shell
# 启动 spark-shell
/tidb-deploy/tispark-master-7077/bin/spark-shell

# 执行 spark.sql("select ti_version()").collect

TiSpark v2.4.x 升级到 TiSpark v2.5.x_hadoop_02

启动 Spark-sql
# 启动 Spark-sql
/tidb-deploy/tispark-master-7077/bin/spark-sql
# 执行 select ti_version();

TiSpark v2.4.x 升级到 TiSpark v2.5.x_spark_03

三、升级 TiSpark

3.1 下载升级软件

# 下载 Spark V3.1.3
curl -L "https://dlcdn.apache.org/spark/spark-3.1.3/spark-3.1.3-bin-hadoop3.2.tgz" -O spark-3.1.3-bin-hadoop3.2.tgz
# 下载 TiSpark V2.5.1
curl -L "https://github.com/pingcap/tispark/releases/download/v2.5.1/tispark-assembly-3.1-2.5.1.jar" -O tispark-assembly-3.1-2.5.1.jar


3.2 备份
\cp -rf /tidb-deploy/tispark-master-7077 /tidb-deploy/tispark-master-7077-bak2.4.1


3.3 升级
# 替换 Spark
mkdir -p /usr/local0/webserver/tispark && tar -zxvf spark-3.1.3-bin-hadoop3.2.tgz -C /usr/local0/webserver/tispark/
mv /usr/local0/webserver/tispark/spark-3.1.3-bin-hadoop3.2 /tidb-deploy/tispark-master-7077
chown tidb.tidb -R /tidb-deploy/tispark-master-7077
# 替换 TiSpark 包
cp -rf tispark-assembly-3.1-2.5.1.jar /tidb-deploy/tispark-master-7077/jars/
# 配置文件
cp -rf /tidb-deploy/tispark-master-7077-bak2.4.1/conf/* /tidb-deploy/tispark-master-7077/conf/

3.4 测试启动 Spark 集群

​/tidb-deploy/tispark-master-7077/sbin/start-all.sh​

启动Spark-shell
# 启动 spark-shell
/tidb-deploy/tispark-master-7077/bin/spark-shell

# 执行 spark.sql("select ti_version()").collect

TiSpark v2.4.x 升级到 TiSpark v2.5.x_hadoop_04

启动 Spark-sql
# 启动 Spark-sql
/tidb-deploy/tispark-master-7077/bin/spark-sql
# 执行 select ti_version();

TiSpark v2.4.x 升级到 TiSpark v2.5.x_hadoop_05

四、测试 TiSpark v2.5.1 鉴权

参考:​​https://github.com/pingcap/tispark/blob/master/docs/authorization_userguide.md​​

Authorization and authentication through TiDB server

The database's user account must have the​​PROCESS​​ privilege.TiSpark version >= 2.5.0Spark version = 3.0.x or 3.1.x

4.1 增加配置 ​​spark-defaults.conf​
spark.sql.tidb.addr    10.0.2.15
spark.sql.tidb.port 4000
spark.sql.tidb.user root
spark.sql.tidb.password abc

# Must config in conf file
spark.sql.auth.enable true
# in seconds. Values range from 5 to 3600
spark.sql.tidb.auth.refreshInterval 30


4.2 配置错误密码
#这里是错误的密码
spark.sql.tidb.password abc

启动 spark-sql 后使用 执行 sql 语句将报错

TiSpark v2.4.x 升级到 TiSpark v2.5.x_sql_06

4.3 修正密码
# 空密码
spark.sql.tidb.password

# 开启下面的 30s 将刷新一下(仅对新连接的spark-sql 使用新配置的 spark.sql.tidb.password)
spark.sql.tidb.auth.refreshInterval 30

启动 spark-sql

/tidb-deploy/tispark-master-7077/bin/spark-sql
use tidb_catalog;
show databases;

TiSpark v2.4.x 升级到 TiSpark v2.5.x_spark_07

select 'CUSTOMER' tablename , count(*) ct from tidb_catalog.TPCH_001.CUSTOMER union all
select 'NATION' tablename , count(*) ct from tidb_catalog.TPCH_001.NATION union all
select 'REGION' tablename , count(*) ct from tidb_catalog.TPCH_001.REGION union all
select 'PART' tablename , count(*) ct from tidb_catalog.TPCH_001.PART union all
select 'SUPPLIER' tablename , count(*) ct from tidb_catalog.TPCH_001.SUPPLIER union all
select 'PARTSUPP' tablename , count(*) ct from tidb_catalog.TPCH_001.PARTSUPP union all
select 'ORDERS' tablename , count(*) ct from tidb_catalog.TPCH_001.ORDERS union all
select 'LINEITEM' tablename , count(*) ct from tidb_catalog.TPCH_001.LINEITEM order by ct desc;

TiSpark v2.4.x 升级到 TiSpark v2.5.x_hadoop_08


4.4 SparkSession 中配置密码
spark.sqlContext.setConf("spark.sql.tidb.addr", your_tidb_server_address)
spark.sqlContext.setConf("spark.sql.tidb.port", your_tidb_server_port)
spark.sqlContext.setConf("spark.sql.tidb.user", your_tidb_server_user)
spark.sqlContext.setConf("spark.sql.tidb.password", your_tidb_server_password)


4.5 限制不能与 TiDB 以外的其他数据源一起工作不支持基于角色的权限TiDB Data Source API 不支持,例如 TiBatchWrite


五、总结本篇实践了 tiup list tispark --all 没有 TiSpark v2.5.x的情况下,升级到 TiSpark v2.5.1;同时试用了 TiSpark v2.5.x 新支持的鉴权特性。

谢谢!


参考

​​https://tidb.net/blog/19eeb447#Spark​​​

Standalone集群升级步骤

​​https://tidb.net/blog/b8f902a9#TiSpark​​​

2.4.1(Spark 2.4.5)到TiSpark 2.5.0(Spark 3.0.X/3.1.X)迁移实践

​​https://github.com/pingcap/tispark/blob/master/docs/authorization_userguide.md​​

上一篇: 租用美国服务器:潜在的风险与应对策略。 下一篇: 离线安装 TiSpark v2.5.1