Post

Remembered the words but forgot the files? Ask Ripgrep.

Ripgrep is smart at seaching files fast and feature-rich.

Remembered the words but forgot the files? Ask Ripgrep.

Many times I have to find several words among tens or hundreds, or maybe thousands of files in one folder. I’m not sure if the words is in subfolders or not.

So I need a helper and it’s “Ripgrep”.


Ripgrep

Ripgrep is a command line tool with high capacity to work on file searching operations. There are claims1 that Ripgrep is faster than many grep tools.

Below is the github repo of the producer.

As far as I used it, I am satisfied with its speed and result display. It provides result with concise and readable colorful highlighted way.

By default, Ripgrep is searching for the regular expression we supplied. But we can add flags like exact matches, files only, and even hidden files (such as files listed in “.gitignore”).

Basic use

Assume that we have installed Ripgrep into our computer or projects, we can start searching something like this.

1
2
rg "<regex>"     # current path
rg "<regex>" .   # current path as . (dot)

rg basic

This simple command uses Ripgrep to search that text as a regular expression in all files in the current directory and subdirectories.

Find text in hidden directory

We can use the flag --hidden or -. to search in the hidden files or folders, like this.

1
2
rg --hidden "<regex>" .
rg -. "<regex>" .

rg hidden

List only files

Then we can use the flag -l to list only the file name that contain the text.

1
2
rg -l "<regex>" . # only files
rg -l. "<regex>" . # only files including hidden files

rg only files

Exact match

If we want to search for the exact match of the text, we can use the flag --word-regexp or -w.

1
2
rg --word-regexp "<exact_text>" .
rg -w "<exact_text>" .

rg exact match

As above, "Boston" is a word that exists in those files but "oston" can’t be found because it’s not a word, it’s just a substring.


Ripgrep all

rga or “ripgrep all” is the command that searches broader than rg in terms of file types such as .pdf, .docx, etc. while rg can search in text-based such as .md, .js, .py, etc.

rga adopts command line options from rg and adds some more options to search in more file types.

search in DOCXs

Say we have Microsoft Word files. If we use normal rg, the result is nothing. But with rga, we can get results.

1
rga "<regex>" .

rga docx

search in PDFs

Like rga on DOCX files, we also use rga on PDF files. Additionally, it displays page numbers too.

1
rga "<regex>" .

rga pdf

search a set of word (order accordingly)

We know that the regex "regex_1 | regex_2 means that we are looking for regex_1 or regex_2. And "regex_1.*regex_2 means that we are looking for regex_1 and regex_2 in the same line in the accordingly order.

But how can we search for regex_1 and regex_2 in the same file but not in the same line?

We add the flag --multiline to tell it to search across multiple lines. Sometimes we need to adjust the regex by having (?s) or add the flag --multiline-dotall in order to cover new-line characters.2

1
2
3
4
5
6
7
8
rga --multiline "<regex_1>.*<regex_2>" .
rga -U "<regex_1>.*<regex_2>" . # same as --multiline

rga --multiline "(?s)<regex_1>.*<regex_2>" . # handle new-line characters
rga -U "(?s)<regex_1>.*<regex_2>" . # same as --multiline, handle new-line characters

rga --multiline --multiline-dotall "<regex_1>.*<regex_2>" # handle new-line characters.
rga -U --multiline-dotall "<regex_1>.*<regex_2>" # same as --multiline, handle new-line characters

rga multiline

search a set of word (regardless of order)

Then how should we do search several words without respecting orders?

We need a trick to do so.

1
rga "<regex_2>" $(rga "<regex_1>" -l .)

This implies that we search for files containing regex_1 first and then search for regex_2 in those files.

The output will be just the text regex_2 only so that we need to ensure that regex_1 is correct.

rga word set


References

This post is licensed under CC BY 4.0 by the author.