1 简介

有时文本存在大量的需要删除或替换的内容,如果手动则工作量非常大。使用Linux的sed命令可以完成大量的替换和删除操作,功能非常强大。这里仅列举实际中遇到的场景进行示例。

就ES应用本身来讲,问题的排查思路可以从两个大方向开展:

2 示例

2.1 删除匹配到行的下一行内容

需求来源:使用smplayer播放器播放视频,不自动加载字幕,字幕配置没有问题。经检查发现是字幕文件有问题。问题字幕如下:

1
00:00:00,460 --> 00:00:05,190
Welcome. You must be excited about how to create and write your first go program!
1

2
00:00:05,430 --> 00:00:10,110
So fasten your seat belts and get your favorite drink wine is coffee.
2

3
00:00:10,110 --> 00:00:16,200
We're starting. Now I'm going to show you how to create and run a very simple Go program. You will print
3

正常字幕如下:

1
00:00:00,460 --> 00:00:05,190
Welcome. You must be excited about how to create and write your first go program!

2
00:00:05,430 --> 00:00:10,110
So fasten your seat belts and get your favorite drink wine is coffee.

3
00:00:10,110 --> 00:00:16,200
We're starting. Now I'm going to show you how to create and run a very simple Go program. You will print

从上面的对比可以看出,需要将问题字幕中每行英文字母下面的数字删掉。如果数量少可以手动删除,但如果有上千行手动就非常累了。

根据正常字幕的格式,可以使用sed,匹配所有字母开头的行,然后删除匹配到的行的后面一行。命令如下:

sed -i '/^[a-zA-Z]/{n;d}' file.srt