Chinaunix首页 | 论坛 | 博客
  • 博客访问: 17578
  • 博文数量: 3
  • 博客积分: 190
  • 博客等级: 入伍新兵
  • 技术积分: 50
  • 用 户 组: 普通用户
  • 注册时间: 2009-12-30 08:35
文章分类

全部博文(3)

文章存档

2010年(1)

2009年(2)

我的朋友

分类:

2009-12-31 09:06:09

前日到客户处更换fastT200硬盘的时候,不幸发生了,想想现在都后怕。
客户的fastt200共有六块盘,前34G的4块盘做RAID5并划出2快逻辑盘,后两块73G的盘做RAID1,后发现RAID1中的一块盘坏掉,导致raid降级,需要更换,而且cache电池也报警,也需要更换。但是我当时没有和73一样大的盘,只有一块一百多个G的盘,当时也没有怎么考虑就给换上了,可没有想到的是,在静静的等待后,RAID5中的一块逻辑盘失败了。这样导致挂在存储上的数据库也启动不了,后来怎么也恢复不了,又oracle数据库用的是裸设备,哎……数据等于就是洗白了。但是我还是不怎么明白为什么会出现这种情况,现将当时fastt200log文件贴出,望大家能指点迷津:

Details

Storage Subsystem:
Array: 1
RAID level: 5
Logical Drives: data1, data2
Status: Failed

Failed Logical Drive - Drive Failure

What Caused the Problem?

One or more drives in the array have failed, causing the associated logical drives to fail. The Recovery Guru Details area provides specific information you will need as you follow the recovery steps.

Caution
Electrostatic discharge can damage sensitive components. Use a grounding wrist strap or other anti-static precautions before removing or handling components.

Important Notes

  • If the logical drive is marked failed because you replaced the wrong drive during a degraded logical drive recovery procedure, you have not lost data. To return the logical drive to the degraded state, reinsert the drive.
  • You may be able to recover data from a failed logical drive. Whether or not this is possible depends on how the failure occurred. You can use this procedure to restore data in two ways: attempting a data recovery or restoring data from backup media.
  • All I/O to the affected logical drives will fail.
  • To the operating system (OS), a failed logical drive is exactly the same as a failed non-RAID drive. Refer to the operating system documentation for any special requirements concerning failed drives and perform them where necessary.
  • Make sure the replacement drives have a capacity equal to or greater than the failed drives you will remove in the following steps.
  • You can replace the failed drives while other arrays in the storage subsystem are receiving I/O.

Recovery Steps

1

It may be possible to recover data from the failed logical drives. If you wish to attempt a data recovery, you must contact your technical support representative. Do not perform steps 2 - 8. Performing any recovery actions before contacting your technical support representative could jeopardize any chance of recovering data. If you prefer to recover from an existing backup or you have mistakenly removed the wrong drive while performing a degraded logical drive recovery procedure, go to step 2.

2

If you have mistakenly removed the wrong drive while performing a degraded logical drive recovery procedure, you can return the logical drive back to the degraded state by replacing the drive you removed. After the logical drives return to the degraded state, select Recheck and perform the recovery procedure listed for a degraded logical drive. You are finished with this procedure.

3

There are several different types of logical drives that can exist in an array. Use the Recovery Guru details area to determine the affected array. Then, find the array in the Logical View of the Subsystem Management Window. Use the information provided by the AMW to determine the types of logical drives on the affected array. Step through every entry in the following table and perform all procedures associated with the logical drive type combination for the affected array.

If...

Then...

One or more FlashCopy logical drives exist on the affected array

The information on the FlashCopy(s) is no longer valid and cannot be retrieved. Delete all FlashCopy logical drives associated with the affected array. You will be able to create any needed FlashCopys after this procedure has been completed.

One or more FlashCopy repository logical drives associated with FlashCopys on other arrays exist on the affected array

The information on the FlashCopy(s) is no longer valid and cannot be retrieved. Delete all FlashCopy logical drives associated with the FlashCopy repositories on the affected array. You will be able to create any needed FlashCopys after this procedure has been completed.

The mirror repository logical drives exist on the affected array

All mirror relationships associated with primary mirror logical drives on this storage subsystem are invalid.

Save the Storage Subsystem Profile before removing mirror relationships. The profile will give you a roadmap of any mirror relationships you may want to recreate after re-activating RM. You can do this by using the View>>Storage Subsystem Profile>>Save As feature in the Storage Subsystem Management window.

Perform a remove mirror relationship for all primary mirror logical drives

Deactivate the RM feature

Activate the RM feature

One or more primary logical drive mirrors exist on the affected array

You have two options to recover from this failure.

1. Remove the mirror relationship and re-establish it after the array is brought back to optimal

OR

2. If role reversal is possible, bring the array back to optimal, change the role of the primary logical drive to secondary, and return it back to primary once the synchronization has completed .

If...

Then...

You wish to use option 1.

Remove the mirror relationship and go to Step 4

You wish to use option 2.

Go to Step 4

One or more secondary logical drive mirrors exist on the affected array

The mirror relationship will returned to an optimal state once the array is brought back to optimal and the mirror has re-synchronized.

Go to Step 4.

Only standard logical drives exist on the affected array

Go to Step 4.

4

Remove all failed drives associated with this array (the fault indicator lights on the failed drives should be lit). To determine the associated drives, select one of the affected logical drives that are listed in the Recovery Guru Details area in the Subsystem Management Window. Each associated drive will have an association dot underneath it.

5

Wait 30 seconds, then insert the new drives. The fault indicator light on the replaced drives may be lit for a short time (one minute or less).

Note: Wait until the replaced drives are ready (fault indicator light off) before going to step 6.

6

If...

Then...

One or more primary logical drive mirrors exist on the affected array

Change the role of all primary logical drives to a secondary role. Wait until the synchronization is completed on all logical drives before continuing. See comment above

7

If...

Then...

A primary or secondary logical drive mirror relationship exists on the affected array

Save this procedure by selecting Save As because once you perform step 8 and the failure is fixed, you will not be able to access the information in step 9 from the Recovery Guru.

Go to Step 8.

No primary or secondary logical drive mirror relationships exist on the affected array.

Select the array in the Logical View of the Subsystem Management Window; then, select Array>>Initialize . Result: The logical drives in the array are initialized, one at a time. When initialization starts on a logical drive, the icon changes to Operation in Progress . When initialization is completed, all logical drives in the array are Optimal .

Note: To monitor initialization progress, select the logical drive. Then, select Logical Drive>>Properties . Note that once the operation in progress has completed, the progress bar is no longer displayed in the Properties dialog.

Save this procedure by selecting Save As because once you perform step 8 and the failure is fixed, you will not be able to access the information in step 9 from the Recovery Guru.

Go to Step 8.

8

Select Recheck to rerun the Recovery Guru to ensure that the failure has been fixed.

9

If...

Then...

You deleted one or more FlashCopy logical drives or FlashCopy repositories in Step 3

If desired, create new FlashCopys to replace those deleted.

All mirror repositores were deleted because of failed mirror repository logical drives on the affected array

Activate the RM feature

If desired, create new mirror relationships to replace those deleted

You may want to save the Storage Subsystem Profile before removing mirror relationships. Refer to item 3 in Step 3 for steps used to save the Storage Subsystem Profile.

One or more primary logical drive mirrors were changed to a secondary role in Step 6

Change the role of these logical drives back to a primary role. Wait until the synchronization is completed on all logical drives before continuing.

One or more logical drives were initialized in Step 7.

Add the initialized logical drives in the affected array back to the operating system. You may need to reboot the system to see the logical drives.

Note: Do not start I/O to these logical drives until after you restore from backup.

Restore the data for the initialized logical drives from backup.

后来只有给客户重建FASTT200存储,再重建数据库了:

Oracle数据库重建步骤(8.1.7

1、由于数据库软件没有破坏,所以不用重新安装Oracle软件。

2、计划用裸设备来建库。

准备工作

1、  明确原安装数据库的基本情况,包括数据库根目录、主目录、实例名等;

通过查看.profile得知:

PATH=/usr/bin:/etc:/usr/sbin:/usr/ucb:$HOME/bin:/usr/bin/X11:/sbin:.

export PATH

 

if [ -s "$MAIL" ]           # This is at Shell startup.  In normal

then echo "$MAILMSG"        # operation, the Shell checks

fi                          # periodically.

 

 

umask 022

 

export DISPLAY=133.43.1.58:0.0

export ORACLE_BASE=/oracle

export ORACLE_SID=yhlbas

export TMP=/oracle/temp

 

export ORACLE_HOME=$ORACLE_BASE/product/8.1.7

export NLS_LANG=AMERICAN_AMERICA.ZHS16GBK

export ORA_NLS33=$ORACLE_HOME/ocommon/nls/admin/data

export PATH=$PATH:$ORACLE_HOME/bin:/bin:/usr/ccs/bin:/usr/lbin:/usr/bin

export LIB_PATH=$ORACLE_HOME/lib

export LD_LIBRARY_PATH=$ORACLE_HOME/lib:$ORACLE_HOME/network/lib

 

2、  明确需要建立几个库,每个库需要的表空间、以及每个库的控制文件、联机日志文件、归档日志文件、数据文件等的详细规划;

一个数据库,通过查看前数据库pfileinityhlbas.ora)得知:

db_name = "yhlbas"
instance_name = yhlbas
 
service_names = yhlbas
 
 
control_files = ("/dev/rlv_jfvg_128_01", "/dev/rlv_jfvg_128_02", "/dev/rlv_jfvg_128_03")
 
open_cursors = 300
max_enabled_roles = 30
db_block_buffers = 153600
#76800
 
shared_pool_size =209715200
 #157286400
 
large_pool_size = 31457280
java_pool_size = 31457280
 
log_checkpoint_interval = 10000
log_checkpoint_timeout = 180000
 
processes = 500
#150
 
log_buffer = 3145728
 
# audit_trail = false  # if you want auditing
# timed_statistics = false  # if you want timed statistics
# max_dump_file_size = 10000  # limit trace file size to 5M each
 
# Uncommenting the lines below will cause automatic archiving if archiving has
# been enabled using ALTER DATABASE ARCHIVELOG.
log_archive_start = true
log_archive_dest_1 = "location=/arch2 reopen=600"
log_archive_format = arch_%t_%s.arc
 
 
#DBCA uses the default database value (30) for max_rollback_segments
#100 rollback segments (or more) may be required in the future
#Uncomment the following entry when additional rollback segments are created and made online
#max_rollback_segments = 151
# If using private rollback segments, place lines of the following
# form in each of your instance-specific init.ora files:
rollback_segments = ( RBS0, RBS1, RBS2, RBS3, RBS4, RBS5, RBS6, RBS7, RBS8, RBS9, RBS10, RBS11, RBS12, RBS13, RBS14, RBS15, RBS16, RBS17, RBS18, RBS19, RBS20, RBS21, RBS22, RBS23, RBS24, RBS25, RBS26, RBS27, RBS28 )
 
# Global Naming -- enforce that a dblink has same name as the db it connects to
# global_names = false
 
# Uncomment the following line if you wish to enable the Oracle Trace product
# to trace server activity.  This enables scheduling of server collections
# from the Oracle Enterprise Manager Console.
# Also, if the oracle_trace_collection_name parameter is non-null,
# every session will write to the named collection, as well as enabling you
# to schedule future collections from the console.
# oracle_trace_enable = true
 
# define directories to store trace and alert files
background_dump_dest = /oracle/admin/yhlbas/bdump
core_dump_dest = /oracle/admin/yhlbas/cdump
#Uncomment this parameter to enable resource management for your database.
#The SYSTEM_PLAN is provided by default with the database.
#Change the plan name if you have created your own resource plan.# resource_manager_plan = system_plan
user_dump_dest = /oracle/admin/yhlbas/udump
 
db_block_size = 8192
 
remote_login_passwordfile = exclusive
 
os_authent_prefix = ""
 
compatible = "8.1.0"
sort_area_size = 1048576
sort_area_retained_size = 1048576
 
#20060324 by olm
db_file_multiblock_read_count=64

3、备份原数据库的口令文件、参数文件、.profile文件;

4、搭建硬件物理环境,准备相应的裸设备载体,并设置好这些设备的宿主及权限。

准备物理环境

1、  关闭FastT200存储;

2、  换上坏掉的硬盘及控制器电池;

3、  开启FastT200存储;

4、  进入管理软件,查看硬盘及控制器电池状态,并重设控制器电池日期;待一切正常(物理硬件)后,用串口登录存储。

5、  在每个控制器上运行sysWipe命令,清楚所有的原始设置。

6、  重启fastt200存储,再次用串口登录到每一个控制器,用netCfgSet命令设置每一个串口的IP信息,将其设置到小机一个网段,这样便于管理。

7、  登录到管理软件,对存储进行创建创建逻辑驱动器、创建RAID、创建逻辑磁盘、定义主机、定义主机端口等操作。

以下操作大致如:

(1)    FastT200的两块控制卡的网络接口和安装有SM9的管理机连接在同一个交换机上

(2)    SM管理界面增加设备,输入FastT200控制卡的IP地址(亦可以自动被识别)

 

(3)    创建逻辑驱动器

打开向导

选择RAID级别及硬盘

 

 

选择创建逻辑盘的大小及LUN

创建完成,可继续创建新逻辑盘

 

(4)    定义主机

输入主机名

 

(5)    定义主机端口

选择光纤通道卡的串号

 

 

输入主机类型及主机端口名字

(6)    完成

 

小机部分处理:

1、  Export原来的卷组jfvg(在这之前如果jfvg为活动的,varyoffvg jfvg);

2、  删除原来的从存储上识别的磁盘;

rmdev  -dl hdisk3

rmdev  -dl hdisk4

rmdev  -dl hdisk5

rmdev  -dl hdisk6

rmdev  -dl hdisk7

rmdev  -dl hdisk8

rmdev  -dl hdisk9

rmdev  -dl hdisk10

rmdev  -dl hdisk11

rmdev  -dl hdisk12

rmdev  -dl hdisk13

rmdev  -dl hdisk14

rmdev  -dl hdisk15

rmdev  -dl hdisk16

rmdev  -dl hdisk17

rmdev  -dl hdisk18

rmdev  -dl hdisk19

rmdev  -dl hdisk20

rmdev  -dl hdisk21

rmdev  -dl hdisk22

rmdev  -dl hdisk23

rmdev  -dl hdisk24

rmdev  -dl hdisk25

rmdev  -dl hdisk26

rmdev  -dl hdisk27

rmdev  -dl hdisk28

rmdev  -dl hdisk29

rmdev  -dl hdisk30

rmdev  -dl hdisk31

rmdev  -dl hdisk32

rmdev  -dl hdisk33

rmdev  -dl hdisk34

rmdev  -dl hdisk35

 

3、  重认从存储划过来的盘:cfgmgr –v

以上3个步骤在主备机上都要执行,下面步骤仅在主机上执行即可。

4、  创建卷组jfvg(将卷组的PP设置成128MB

5、  创建裸设备(先暂时创建这么多,以后根据需要可以再建):

mklv –y lv_jfvg_512_01 –t raw jfvg 4

mklv –y lv_jfvg_512_02 –t raw jfvg 4

mklv –y lv_jfvg_512_03 –t raw jfvg 4

mklv –y lv_jfvg_512_04 –t raw jfvg 4

mklv –y lv_jfvg_512_05 –t raw jfvg 4

mklv –y lv_jfvg_512_06 –t raw jfvg 4

mklv –y lv_jfvg_512_07 –t raw jfvg 4

(以上7个文件作为数据库创建后系统用的表空间数据文件的补充)

mklv –y lv_jfvg_512_08 –t raw jfvg 4

mklv –y lv_jfvg_512_09 –t raw jfvg 4

mklv –y lv_jfvg_512_10 –t raw jfvg 4

mklv –y lv_jfvg_512_11 –t raw jfvg 4

mklv –y lv_jfvg_512_12 –t raw jfvg 4

mklv –y lv_jfvg_512_13 –t raw jfvg 4

mklv –y lv_jfvg_512_14 –t raw jfvg 4

(以上文件可以作为数据库临时表空间,回退表空间等的补充)

mklv –y lv_jfvg_2048_01 –t raw jfvg 16

mklv –y lv_jfvg_2048_02 –t raw jfvg 16

mklv –y lv_jfvg_2048_03 –t raw jfvg 16

mklv –y lv_jfvg_2048_04 –t raw jfvg 16

mklv –y lv_jfvg_2048_05 –t raw jfvg 16

mklv –y lv_jfvg_2048_06 –t raw jfvg 16

mklv –y lv_jfvg_2048_07 –t raw jfvg 16

mklv –y lv_jfvg_2048_08 –t raw jfvg 16

mklv –y lv_jfvg_2048_09 –t raw jfvg 16

mklv –y lv_jfvg_2048_10 –t raw jfvg 16

(以上作为表空间的数据文件)

6、  更改这些设备的宿主:

chown oracle:dba /dev/*lv_jfvg_*

7、  更改这些设备的权限(这里将其赋予最大权限):

chmod 777 /dev/*lv_jfvg_*

8、在jfvg上创建一个/oracle2文件系统,用来存放创建数据库用的控制文件,系统用表空间数据文件等,并且用chown命令更改文件系统权限,使得oracle用户有权限读写。

 

建库实施步骤

在这之前,备份好.profileinityhlbas.ora文件以及口令文件和监听文件

1、  删除原有的数据库(dbassist工具),如果因为原数据库不能启动到mount状态执行以下过程来通过删除原实例来实现:

删除元数据库的数据文件、log文件、控制文件、dump文件、参数文件及密码文件;

删除系统中的oratab文件(此文件具体位置不好说,可以用find命令进行查找)中记录原来实例的那一行。

2、  建立相应的裸设备及设置好他们的宿主和权限(前面已经实现);

3、  建库:

dbassist工具:

进入configuration assistant窗口,选择create database

 

 

 

选择建库的方式,这里选择自定义方式建库:

 

 

 

 

选择要建立的数据库的类型,这里选择Multipurpose

 

 

确定要并发访问数据库的最大用户量,根据实际情况而定:

 

 

 

 

确定数据库采用哪种模式,是专有模式还是共享模式,这里我选择专有模式(第一项):

 

选择在数据库中我们想要去配置的选项(通常情况下选择最后一个就行了):

 

 

 

 

 

输入数据库名称,实例名,pfile路径还有设置字符类型,这里的数据库名称按客户要求来填,但是实例名必须要和环境变量中指定的实例名一致:

 

输入控制文件的载体以及设置相关的参数:

 

 

 

 

这里设置系统表空间、tools表空间、user表空间……的大小以及相应的数据文件和相关参数

指定联机日志文件及相关参数:

 

 

 

设置checkpoint interal等参数,一般默认就好;另外决定是否开启归档,并指定归档日志的格式及目标地址。

 

这里默认就好:

 

 

 

 

 

设置共享池等的值:(这里可按照原来的数据库的值取值):

 

 

 

 

 

设置TRACEfile(跟踪文件)目录:一般情况下默认就好:

 

 

点击Create database now,完成库的创建:

 

以上建库步骤是在主机上执行,下面进行备机的数据库配置以实现oracle双机

由于,在主机上创建数据库的时候,将控制文件、联机日志文件、表空间数据文件都放在存储上,所以,根据主机参数文件的相关信息和参考主机oracle配置,将主机上相关的参数文件、口令文件、跟踪文件、监听文件等拷贝到备机相应的目录即可:

1、  停掉主机oracle应用,并vayoffvg jfvg

2、  在备机上导入卷组jfvgimportvg –y jfvg hdiskname hdiskname是从存储上识别的磁盘并且在主机上划入了jfvg);

3、  从主机上拷贝相应的文件到备机,覆盖备机上原来的文件。

4、  启动数据库。

这里,如果启动数据库报错,查看alertlog获取报错的原因,在根据具体情况进行更改即可。注意参考主机的oracle配置情况就是了。

5、  在备机的数据库启动一切正常后,停掉oracle数据库应用。

6、  进行双机调试,这里,因为原ha配置并没有破坏,而且后来创建的jfvg和原来的是一样的,也就是说重建后的所有卷组、网络等都没有发生改变,所以,双机配置不需要进行更改。在主备机资源同步后,启动双机查看状态是否正常,oracle数据库是否正常启动。如果在此步骤中发生异常,更根据错误日志,进行相应的更改即可,知道调试正常。

Oracle用户、表空间等的创建以及性能调优

这里,我用oracle client(客户端软件)图形工具来进行:

所以,在这之前就要做如下准备工作

1、配置好oracle主机网络如:监听程序、网络服务名、命名方法等。又由于,我们重建的数据库的实例名、数据库名、IP地址、数据库模式等都没有发生改变,所以,这里我们只需要将原监听文件、tnsnames.ora等文件恢复到原数据库的状态即可(在前面的步骤已备份原数据库的这些文件),当然也可以手工编辑这些文件,同样也可以通过netmgr图形软件(8.1.7管理员在2009年8月13日编辑了该文章文章。

-->
阅读(6730) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~