技术好文共享：谁说阿里云不能跑Oracle，让驻云架构师告诉你怎么办！

资讯 2024-07-13 阅读:105 评论:0

美化布局示例

欧易(OKX)最新版本

【遇到注册下载问题请加文章最下面的客服微信】永久享受返佣20%手续费！

APP下载全球官网大陆官网

币安(Binance)最新版本

币安交易所app【遇到注册下载问题请加文章最下面的客服微信】永久享受返佣20%手续费！

APP下载官网地址

火币HTX最新版本

火币老牌交易所【遇到注册下载问题请加文章最下面的客服微信】永久享受返佣20%手续费！

APP下载官网地址

以下正文：

The following text:

· 关于阿里云的HAVIP

• Havip on Aliun

阿里云官方文档的介绍：

An introduction to Aliun's official document:

私网高可用虚拟IP（Private High-Availability Virtual IP Address，简称HaVip），是一种可以独立创建和释放的私网IP资源。这种私网IP的特殊之处在于，用户可以在ECS上使用协议进行该IP的宣告。

The use of a virtual IP (Private High-Availability Digital IP Address) is a private IP resource that can be created and released independently. The special feature of this IP is that the user can use the protocol to make the IP announcement on ECS.

一个HaVip对象可以与最多两台ECS实例进行绑定；绑定了的实例可以通过ARP方式进行该私网IP的宣告。

A HaVip object can be tied up with up to two ECS examples; a binding example can be made of the net IP announcement by means of an ARP.

一台ECS实例可以在持有一个普通私网IP的情况下，可以宣告多个HaVip类型的私网IP，从而同时持有多个私网IP。

An ECS example can declare multiple web IPs of the HaVip type, and thus hold multiple web IPs at the same time, in the case of a common net IP.

利用可在ECS进行私网IP宣告的功能，可以实现基于VRRP协议的高可用方案,包括keepalived、heartbeat等成熟的开源方案。

Using the Internet IP declaration function, which can be used in ECS, high-availability options based on VRRP, including mature open-source options such as keepalived, heartbeat, etc., can be achieved.

HaVip可以与EIP进行绑定，从而实现HaVip在ECS实例间切换时，发向EIP的消息也被重定向到新的ECS实例上。

HaVip was able to bind to EIP, thereby enabling HaVip to switch between ECS examples, and messages sent to EIP were redirected to the new ECS examples.

HaVip仅支持VPC网络环境。Classic网络环境下不提供HaVip功能。

HaVip only supports the VPC web environment.

文字多了理解起来困难，直接看图。

More words are difficult to understand and look directly at the map.

在vpc-23c099ge5中有个交换机vsw-23a3275jc的交换机，交换机下有个HAVIP是10.10.1.99，在这个高可用虚拟IP下挂载了两台ECS,那10.10.1.99 这个IP地址就可以在这两台ECS上飘来飘去了。

In Vpc-23c099ge5, there is a switch Vsw-23a3275jc, under which there is a HAVIP 10.10.1.99, which has two ECSs mounted under this high-availability virtual IP, and the IP address can float over both ECSs.

· Keepalived是什么？

What's Keepalived?

keepalived是一个类似于layer3、4、7交换机制的软件，也就是我们平时说的第3层、第4层和第7层交换。Keepalived的作用是检测web服务器的状态，如果有一台web服务器死机，或工作出现故障，Keepalived将检测到，并将有故障的web服务器从系统中剔除，当web服务器工作正常后Keepalived自动将web服务器加入到服务器群中，这些工作全部自动完成，不需要人工干涉，需要人工做的只是修复故障的web服务器。

The role of Kepalived is to detect the condition of the web server, and if a web server is dead or malfunctions, Keepalived will detect it and remove the faulty web server from the system. When the web server is working, Keepalived automatically adds the web server to the server pool, all of this work is done automatically, without manual intervention, and all that needs to be done manually is to repair the failure of the web server.

但是在这里，keepalived的作用是与HAVIP通信，目的是将HAVIP指向到我们开启了keepalived服务的那台ECS上。

But here, the role of keepalived is to communicate with Havip in order to point Havip to the ECS where we opened the keeperived service.

· 关于阿里云的Oracle的高可用

:: High availability of Oracle on Aliun

有着Oracle背景的DBA们都知道，Oracle的高可用集群是Real Application Cluster（RAC），但是搭建RAC集群需要几个硬性条网络通讯模式得是广播

The DBAs with the Oracle background know that Oracle's high-availability cluster is Real Application Cruster (RAC), but building RAC clusters requires several hard-line communication modes.

网络通讯模式得是广播

Internet communications have to be broadcast.

必须有两个网络分别用于心跳链路与公网服务

There's got to be two networks for heartbeat and public network services.

共享存储（可以使用NFS挂载来解决）

Shared storage (can be solved using NFS mount)

这网络广播模式在阿里云上就无法跳过，更不用说VPC环境下只能有一个网卡地址，只能考虑使用其他的方案来解决OracleDB的高可用环境。

This broadcasting mode cannot skip the Ali Cloud, let alone have a web card address in the VPC environment, and can only consider using other options to solve the high-availability environment of OracleDB.

有过阿里云RDS经验的同学都知道，RDS的高可用是通过主备库的服务切换来实现的，当主库损坏的情况下，备库会在很短时间内接管服务，其代价就是在切换过程中session会断开造成短时间的数据库服务中断，大概在30秒左右。而我们在阿里云上实现Oracle高可用也是类似这种方式实现。

Students who have had experience with RDS know that the high utility of RDS is achieved through a service switch from the main repository, which, when damaged, will take over the service within a very short period of time, at the cost of disconnecting the supply during the switch, which will result in short-term interruptions in database services, approximately 30 seconds. The same way that we achieve the high availability of Oracle on the Ali cloud.

言归正传，接下来我们说说如何在阿里云上部署这套高可用方案。

Let's get to the bottom of this, let's talk about how to deploy this highly available package on the Ali Clouds.

传统方式下的Dataguard架构如下图：

The structure of the Dataguard in the traditional way is as follows:

一般是由有两台ECS，IP地址分别为10.10.1.2，与10.10.1.3，这两台ECS上部署着一套Oracle PRIMARY-STANDBY环境，这套Dataguard方案使用Oracle dgbroker管理，当PRIMARY库崩溃的时候，Standby会主动的接管服务，但是这里大家都知道，Oracle database的访问是需要通过listener的，我们两台ECS默认的IP地址是不同的，这样当standby接管服务后，application的数据库连接池要把IP改为10.10.1.3才能再次连接数据库服务，大家都知道，连接池地址的改动是要重启容器，如果application都需要重启，就完全不能称做高可用了，很庆幸，阿里云提供了一个叫做havip的服务。

It is common for two ECSs, IP addresses 10.10.1.2 and 10.10.1.3, to have an Oracle PRIMARY-STANDBY environment deployed on them. The Dataguard program is managed using Oracle dgbroker. When the PRIMARY library collapses, Standby will take over the service on its own initiative, but it is well known that Oracle database's visit was required through Listener, and our two ECS default IP addresses are different, so that when Standby takes over the service, the database connection pool is changed to 10.10.1.3 to reconnect the database service.

我们来看看下面这幅图

Let's take a look at this picture below.

这里我们在ECS原IP的基础上，加入了HAVIP的概念，application 通过10.10.1.99这个IP地址访问数据库服务，当PRIMARY与STANDBY角色互换之后。

Here we have incorporated the HAVIP concept on the basis of the original IP of the ECS, application to access database service through an IP address of 10.10.1.99, when PRISARY and STANDBY roles are exchanged.

application依然还是通过10.10.1.99访问数据库服务，只是这个IP地址已经漂移到我们曾经的standbyDB了。大家都知道Oracle的RAC环境是必须共享存储的，也就是说当物理文件损坏的时候，整个数据库服务依然还是会崩溃。上面这套HAVIP+Dataguard的方案既实现了数据库物理层面的灾备，同时可以实现数据库服务停止后的快速接管。

The application still has access to the database service through 10.10.1.99, except that the IP address has migrated to our old standbyDB. Everyone knows that the Oracle RAC environment must be stored together, which means that the entire database service continues to collapse when physical documents are damaged.

以上就是这套方案的框架图，说起来很简单，但是实现起来就麻烦了，主要两个难点：

This is the framework chart of the programme, which is simple to say, but which is difficult to achieve, the two main difficulties:

如果ECS服务不终止，数据库角色做切换，havip如何漂移？

If the ECS service is not terminated, the database role is switched, how is Havip drifting?

如果ECS服务强制停止了，Havip如何漂移到备用环境？

If the ECS service is forced to stop, how does Havip migrate to the back-up environment?

要解决这两个问题，我们就要用到我们的keepalived了，具体的实现思路，我们来看看。

To solve these two problems, we need to use our Keepalived, concrete ways of realizing them, let's see.

· 实现思路

• Realizing ideas

1：首先我们先创建一套Dataguard环境，为了保证切换后连接池无需改动，两台ECS上的DB的sid必须一致。

1: First, we create a Dataguard environment, and in order to ensure that the switching pool is not changed, the DB sids on the two ECSs must be identical.

2： oracle的dataguard 通过DGbroker管理，当primary db崩溃physical standby db自动切换为primary;这里必须把observer启动在STANDBY上面，我们试过在管理控制台上强制//代码效果参考：http://www.jhylw.com.cn/101624035.html

2: oracle dataguard, managed through DGbroker, automatically switch physical standby db when prismary db crashes to prismary; here Observer must be activated on STANDBY and we have tried to impose//code effects reference on the management console:

interval 2

weight 2

}

vrrp_instance VI_1 {

state MASTER

interface eth0

virtual_router_id 51

priority 100

advert_int 1

authentication {

auth_type PASS

auth_pass 1111

}

track_script {

chk_http_port

}

virtual_ipaddress {

10.10.1.99 dev eth0 label eth0:havip #havip

}

unicast_src_ip 10.10.1.2 #本地IP

Unicast_src_ip 10.10.1.2 #local IP

unicast_peer {

10.10.1.3 #备机IP

10.10.1.3 #prep IP

}

Backup 配置文件

Backup Profile

keepalived.conf

! Configuration File for keepalived

global_defs {

router_id LVS_DEVEL

}

vrrp_script chk_http_port {

script "/etc/keepalived/scheckdb.sh"

interval 2

weight 2

}

vrrp_instance VI_1 {

state BACKUP

interface eth0

virtual_router_id 51

priority 99

advert_int 1

authentication {

auth_type PASS

auth_pass 1111

}

track_script {

chk_http_port

}

virtual_ipaddress {

10.10.1.99 dev eth0 label eth0:havip

}

unicast_src_ip 10.10.1.3

unicast_peer {

10.10.1.2

}

这里我们假设两个场景，keepalived启动后。

Here we'll assume two scenes after the kickoff.

1、10.10.1.2使用了master（primary）配置文件，10.10.1.3使用了backup（standby）配置文件。当primary db与standby db互换了角色，而这时候havip依然是与master也就是10.10.1.2这台绑定。

1, 10.10.1.2 uses the master (primary) profile, and 10.10.1.3 uses the Backup (standby) profile. When Primary db swaps roles with standby db, Havip remains tied to Master, or 10.10.1.2.

2、10.10.1.2使用了master（primary）配置文件，10.10.1.3使用了backup（standby）配置文件。我们强制关闭了10.10.1.2这台ECS，这时候havip漂移到了backup机器上，standby db也变成了primary角色，当我们再次启动10.10.1.2这台ECS后，havip又会飘回master配置文件所在的ECS,这时候数据库服务又无法通过havip访问了。

2 - 10.10.1.2 Using the master (primary) configuration file, and 10.10.1.3 using the Backup (standby) configuration file. We forced the closure of this ECS at 10.10.1.2, when Havip drifted to the Backup machine, and Standby db became the Primary role, and when we restarted the ECS at 10.10.1.2, Havip would fly back to the ECS where the master configuration file was located, and database services could not be accessed through Havip.

这里该如何去解决这个问题呢？面对上面的两个场景，我们取了个巧。

How do we solve this problem here? Faced with the two scenes above, we took a coincidence.

注意配置文件中的这两段

Take note of these two paragraphs in the configuration file

这里shell都会定时的在ECS上执行用于检查环境配置。那么既然可以写逻辑，还有什么不能实现？

Here, the shell is timed to check the environment configuration on the ECS. If logic can be written, what can't be done?

说到这，大家是不是很想看看shell的代码~

Speaking of which, would anyone like to see the shell code?

首先我们看看master的检查逻辑

First, let's take a look at Master's check logic.

mcheckdb.sh

#!/bin/bash

max_sn="PRIMARY"

su - oracle -c "sh /etc/keepalived/oracle/dbrole.sh"

max_sn=

if 【 "$max_sn" !="PRIMARY" 】

then

cat /etc/keepalived/samples/backup.keepalived.conf > /etc/keepalived/keepalived.conf

/etc/init.d/keepalived restart

echo > /etc/keepalived/date

fi;

很简单的逻辑，切换到oracle用户执行一个dbrole.sh，这个shell会执行Oracle db的角色查询，然后把结果写在/etc/keepalived/oracle/dbrole这个文件中。如果结果不是’PRIMARY’就把/etc/keepalived/samples/backup.keepalived.conf 文件内容替换掉当前keepalived进程使用的配置文件，然后再重启keepalived 服务。到这儿，大家应该知道如何做了吧。

It's simple logic to switch to an oracle user to perform a dbule.sh. The shell will perform the Oracle db role query and then write the results in the document /etc/keepalived/oracle/dbule. If the result is not 'PRIMARY', you can replace the /etc/keepalived/samples/backup.keepalived.conf with the configuration file used for the current keepalived process and then reboot the keepalived service.

scheckdb.sh

#!/bin/bash

max_sn="PHYSICAL STANDBY"

su - oracle -c "sh /etc/keepalived/oracle/dbrole.sh"

max_sn=

if 【 "$max_sn"="PRIMARY" 】

then

cat /etc/keepalived/samples/master.keepalived.conf > /etc/keepalived/keepalived.conf

/etc/init.d/keepalived restart

#echo "stop keepalived"

#打开监听

# Turn on the wire

#su - oracle -c " lsnrctl start listener2"

fi;

scheckdb.sh的内容大同小异样,不过多了一步，打开监听listener2,这listener2，就是开启HAVIP的监听地址。

Scheckdb.sh contains much different content, but one more step is to turn on the listening listner2, which is the opening of Havip's listening address.

我们可以在两台ECS上都准备好master与backup 两份配置文件，这样不但解决了上面两个场景的问题，还直接让havip可以根据数据库的角色做漂移，保证在dataguard可用的前提下，时刻漂移在我们的primary database。

We can have two configurations of Master and Backup on both ECSs, which not only solves the problems of the two above scenarios, but also directly allow Havip to drift on the part of the database to ensure that, to the extent that dataguard is available, he can drift at all times into our Primarid database.

最后，给大家提供一些代码与一个小工具。

Finally, you'll be given some code with a small tool.

Keepalived 的各种配置文件：

Various configurations for Keepalived:

下载地址：

Cannot initialise Evolution's mail component.

解压后把整个keepallived 目录直接放到/etc下，注意其中有个oracle目录，包括其中的文件必须改成oracle用户的权限。

Removes the entire keepallived directory directly to/etc, noting that it contains an oracle directory, including files that must be changed to oracle user privileges.

再提供提供一个配置DG的shell工具，大家没事可以用用，脚本有针对性，仅用于学习不建议配置生产环境时使用。

A shell tool with a DG configuration is also available for use, and scripts are targeted only when learning is not recommended for use in the production environment.

下载地址：

Cannot initialise Evolution's mail component.