I'm removing 4 OSTs. I want to replace it with a new disk and use the same index number.
# lfs osts
OBDS:
0: cluster-OST0000_UUID ACTIVE
1: cluster-OST0001_UUID ACTIVE
2: cluster-OST0002_UUID ACTIVE
3: cluster-OST0003_UUID ACTIVE
. . .
Disable file creation on that OST by setting max_create_count to zero:
# lctl get_param osp.cluster-OST0000-osc-MDT*.max_create_count
osp.cluster-OST0000-osc-MDT0000.max_create_count=20000
osp.cluster-OST0000-osc-MDT0001.max_create_count=20000
...
# lctl set_param osp.cluster-OST0000-osc-MDT*.max_create_count=0
osp.cluster-OST0000-osc-MDT0000.max_create_count=0
osp.cluster-OST0000-osc-MDT0001.max_create_count=0
...
I moved all files to other OSTs. "lfs find" cannot find any files on these 4 OSTs.
Command "mv" or and then rename to the original name.
or with the command "lfs_migrate -sy"
lfs find --obd cluster-OST0000_UUID /cluster | lfs_migrate -sy
Find files on multiple OSTs:
lfs find --ost 0 --ost 1 --ost 2 --ost 3 /cluster
Checking still used space and inodes
After moving all files still: 2624 inods are in use and 14.5G total size.
# lfs df -i | grep -e OST0000 -e OST0001 -e OST0002 -e OST0003
cluster-OST0000_UUID 4293438576 644 4293437932 1% /cluster[OST:0]
cluster-OST0001_UUID 4293438576 640 4293437936 1% /cluster[OST:1]
cluster-OST0002_UUID 4293438576 671 4293437905 1% /cluster[OST:2]
cluster-OST0003_UUID 4293438576 669 4293437907 1% /cluster[OST:3]
# lfs df -h | grep -e OST0000 -e OST0001 -e OST0002 -e OST0003
cluster-OST0000_UUID 29.2T 3.8G 27.6T 1% /cluster[OST:0]
cluster-OST0001_UUID 29.2T 3.7G 27.6T 1% /cluster[OST:1]
cluster-OST0002_UUID 29.2T 3.3G 27.6T 1% /cluster[OST:2]
cluster-OST0003_UUID 29.2T 3.7G 27.6T 1% /cluster[OST:3]
I tried to check the fil system for errors:
# umount /lustre/ost01
# e2fsck -fy /dev/mapper/ost01
and
# lctl lfsck_start --device cluster-OST0001
# lctl get_param -n osd-ldiskfs.cluster-OST0001.oi_scrub
. . .
status: completed
I tried to mount OST as ldiskfs and there are several files in /O/0/d*/
# umount /lustre/ost01
# mount -t ldiskfs /dev/mapper/ost01 /mnt/
# ls -Rhl /mnt/O/0/d*/
. . .
/mnt/O/0/d11/:
-rw-rw-rw- 1 user1 group1 603K Nov 8 21:37 450605003
/mnt/O/0/d12/:
-rw-rw-rw- 1 user1 group1 110K Jun 16 2023 450322028
-rw-rw-rw- 1 user1 group1 21M Nov 8 22:17 450605484
. . .
I checked all those files with "ll_decode_filter_fid" and "lfs fid2path":
On OSS servers:
# umount /lustre/ost01
# mount -t ldiskfs /dev/mapper/ost01 /mnt/
# find /mnt/O/0/d*/ -type f
/mnt/O/0/d11/450605003
/mnt/O/0/d12/450605484
. . .
# ll_decode_filter_fid /mnt/O/0/d11/450605003
/mnt/O/0/d11/450605003: parent=[0x200019425:0x733f:0x0] stripe=0 stripe_size=1048576 stripe_count=1 layout_version=0 range=0
umount /mnt
mount -t lustre /dev/mapper/ost01 /lustre/ost01
On client:
# lfs fid2path /cluster [0x200019425:0x733f:0x0]
lfs fid2path: cannot find /cluster [0x200019425:0x733f:0x0]: No such file or directory
I got "No such file or directory" for all those unknown files. So I can assume those are "stray objects from deleted files".
Deactivate the OST
lctl conf_param cluster-OST0000.osc.active=0
...
Check
# lctl get_param osp.cluster-OST000*-osc-MDT*.active
osp.cluster-OST0000-osc-MDT0000.active=0
osp.cluster-OST0000-osc-MDT0001.active=0
osp.cluster-OST0001-osc-MDT0000.active=1
osp.cluster-OST0001-osc-MDT0001.active=1
...
Prepare new disk
Replacing OST with the same index number
"@o2ib" is for Infiniband network.
mkfs.lustre --ost --reformat --replace --index=0 --fsname=cluster --mgsnode=10.3.1.6@o2ib --mgsnode=10.3.1.7@o2ib --servicenode=10.3.1.2@o2ib --servicenode=10.3.1.3@o2ib --mkfsoptions="-i 2048" /dev/mapper/ost00
On MDS server reactivate OST and enable file creation on that OST
lctl conf_param cluster-OST0000.osc.active=1
lctl set_param osp.cluster-OST0000-osc-MDT*.max_create_count=20000