以下正文:
The following text:
· 关于阿里云的HAVIP
• Havip on Aliun
阿里云官方文档的介绍:
An introduction to Aliun's official document:
私网高可用虚拟IP(Private High-Availability Virtual IP Address,简称HaVip),是一种可以独立创建和释放的私网IP资源。这种私网IP的特殊之处在于,用户可以在ECS上使用协议进行该IP的宣告。
The use of a virtual IP (Private High-Availability Digital IP Address) is a private IP resource that can be created and released independently. The special feature of this IP is that the user can use the protocol to make the IP announcement on ECS.
一个HaVip对象可以与最多两台ECS实例进行绑定;绑定了的实例可以通过ARP方式进行该私网IP的宣告。
A HaVip object can be tied up with up to two ECS examples; a binding example can be made of the net IP announcement by means of an ARP.
一台ECS实例可以在持有一个普通私网IP的情况下,可以宣告多个HaVip类型的私网IP,从而同时持有多个私网IP。
An ECS example can declare multiple web IPs of the HaVip type, and thus hold multiple web IPs at the same time, in the case of a common net IP.
利用可在ECS进行私网IP宣告的功能,可以 实现基于VRRP协议的高可用方案,包括keepalived、heartbeat等成熟的开源方案。
Using the Internet IP declaration function, which can be used in ECS, high-availability options based on VRRP, including mature open-source options such as keepalived, heartbeat, etc., can be achieved.
HaVip可以与EIP进行绑定,从而实现HaVip在ECS实例间切换时,发向EIP的消息也被重定向到新的ECS实例上。
HaVip was able to bind to EIP, thereby enabling HaVip to switch between ECS examples, and messages sent to EIP were redirected to the new ECS examples.
HaVip仅支持VPC网络环境。Classic网络环境下不提供HaVip功能。
HaVip only supports the VPC web environment.
文字多了理解起来困难,直接看图。
More words are difficult to understand and look directly at the map.
在vpc-23c099ge5中有个交换机vsw-23a3275jc的交换机,交换机下有个HAVIP是10.10.1.99,在这个高可用虚拟IP下挂载了两台ECS,那10.10.1.99 这个IP地址就可以在这两台ECS上飘来飘去了。
In Vpc-23c099ge5, there is a switch Vsw-23a3275jc, under which there is a HAVIP 10.10.1.99, which has two ECSs mounted under this high-availability virtual IP, and the IP address can float over both ECSs.
· Keepalived是什么?
What's Keepalived?
keepalived是一个类似于layer3、4、7交换机制的软件,也就是我们平时说的第3层、第4层和第7层交换。Keepalived的作用是检测web服务器的状态,如果有一台web服务器死机,或工作出现故障,Keepalived将检测到,并将有故障的web服务器从系统中剔除,当web服务器工作正常后Keepalived自动将web服务器加入到服务器群中,这些工作全部自动完成,不需要人工干涉,需要人工做的只是修复故障的web服务器。
The role of Kepalived is to detect the condition of the web server, and if a web server is dead or malfunctions, Keepalived will detect it and remove the faulty web server from the system. When the web server is working, Keepalived automatically adds the web server to the server pool, all of this work is done automatically, without manual intervention, and all that needs to be done manually is to repair the failure of the web server.
但是在这里,keepalived的作用是与HAVIP通信,目的是将HAVIP指向到我们开启了keepalived服务的那台ECS上。
But here, the role of keepalived is to communicate with Havip in order to point Havip to the ECS where we opened the keeperived service.
· 关于阿里云的Oracle的高可用
:: High availability of Oracle on Aliun
有着Oracle背景的DBA们都知道,Oracle的高可用集群是Real Application Cluster(RAC),但是搭建RAC集群需要几个硬性条网络通讯模式得是广播
The DBAs with the Oracle background know that Oracle's high-availability cluster is Real Application Cruster (RAC), but building RAC clusters requires several hard-line communication modes.
网络通讯模式得是广播
Internet communications have to be broadcast.
必须有两个网络分别用于心跳链路与公网服务
There's got to be two networks for heartbeat and public network services.
共享存储 (可以使用NFS挂载来解决)
Shared storage (can be solved using NFS mount)
这网络广播模式在阿里云上就无法跳过,更不用说VPC环境下只能有一个网卡地址,只能考虑使用其他的方案来解决OracleDB的高可用环境。
This broadcasting mode cannot skip the Ali Cloud, let alone have a web card address in the VPC environment, and can only consider using other options to solve the high-availability environment of OracleDB.
有过阿里云RDS经验的同学都知道,RDS的高可用是通过主备库的服务切换来实现的,当主库损坏的情况下,备库会在很短时间内接管服务,其代价就是在切换过程中session会断开造成短时间的数据库服务中断,大概在30秒左右。而我们在阿里云上实现Oracle高可用也是类似这种方式实现。
Students who have had experience with RDS know that the high utility of RDS is achieved through a service switch from the main repository, which, when damaged, will take over the service within a very short period of time, at the cost of disconnecting the supply during the switch, which will result in short-term interruptions in database services, approximately 30 seconds. The same way that we achieve the high availability of Oracle on the Ali cloud.
言归正传,接下来我们说说如何在阿里云上部署这套高可用方案。
Let's get to the bottom of this, let's talk about how to deploy this highly available package on the Ali Clouds.
传统方式下的Dataguard架构如下图:
The structure of the Dataguard in the traditional way is as follows:
一般是由有两台ECS,IP地址分别为10.10.1.2,与10.10.1.3,这两台ECS上部署着一套Oracle PRIMARY-STANDBY环境,这套Dataguard方案使用Oracle dgbroker管理,当PRIMARY库崩溃的时候,Standby会主动的接管服务,但是这里大家都知道,Oracle database的访问是需要通过listener的,我们两台ECS默认的IP地址是不同的,这样当standby接管服务后,application的数据库连接池要把IP改为10.10.1.3才能再次连接数据库服务,大家都知道,连接池地址的改动是要重启容器,如果application都需要重启,就完全不能称做高可用了,很庆幸,阿里云提供了一个叫做havip的服务。
It is common for two ECSs, IP addresses 10.10.1.2 and 10.10.1.3, to have an Oracle PRIMARY-STANDBY environment deployed on them. The Dataguard program is managed using Oracle dgbroker. When the PRIMARY library collapses, Standby will take over the service on its own initiative, but it is well known that Oracle database's visit was required through Listener, and our two ECS default IP addresses are different, so that when Standby takes over the service, the database connection pool is changed to 10.10.1.3 to reconnect the database service.
我们来看看下面这幅图
Let's take a look at this picture below.
这里我们在ECS原IP的基础上,加入了HAVIP的概念,application 通过10.10.1.99这个IP地址访问数据库服务,当PRIMARY与STANDBY角色互换之后。
Here we have incorporated the HAVIP concept on the basis of the original IP of the ECS, application to access database service through an IP address of 10.10.1.99, when PRISARY and STANDBY roles are exchanged.
application依然还是通过10.10.1.99访问数据库服务,只是这个IP地址已经漂移到我们曾经的standbyDB了。大家都知道Oracle的RAC环境是必须共享存储的,也就是说当物理文件损坏的时候,整个数据库服务依然还是会崩溃。上面这套HAVIP+Dataguard的方案既实现了数据库物理层面的灾备,同时可以实现数据库服务停止后的快速接管。
The application still has access to the database service through 10.10.1.99, except that the IP address has migrated to our old standbyDB. Everyone knows that the Oracle RAC environment must be stored together, which means that the entire database service continues to collapse when physical documents are damaged.
以上就是这套方案的框架图,说起来很简单,但是实现起来就麻烦了,主要两个难点:
This is the framework chart of the programme, which is simple to say, but which is difficult to achieve, the two main difficulties:
如果ECS服务不终止,数据库角色做切换,havip如何漂移?
If the ECS service is not terminated, the database role is switched, how is Havip drifting?
如果ECS服务强制停止了,Havip如何漂移到备用环境?
If the ECS service is forced to stop, how does Havip migrate to the back-up environment?
要解决这两个问题,我们就要用到我们的keepalived了,具体的实现思路,我们来看看。
To solve these two problems, we need to use our Keepalived, concrete ways of realizing them, let's see.
· 实现思路
• Realizing ideas
1:首先我们先创建一套Dataguard环境,为了保证切换后连接池无需改动,两台ECS上的DB的sid必须一致。
1: First, we create a Dataguard environment, and in order to ensure that the switching pool is not changed, the DB sids on the two ECSs must be identical.
2: oracle的dataguard 通过DGbroker管理,当primary db崩溃physical standby db自动切换为primary;这里必须把observer启动在STANDBY上面,我们试过在管理控制台上强制//代码效果参考:http://www.jhylw.com.cn/101624035.html
注册有任何问题请添加 微信:MVIP619 拉你进入群
打开微信扫一扫
添加客服
进入交流群
发表评论