Regular expressions are a useful tool that makes life easier for many SEOs. Sometimes you may come across regular expressions in .htaccess or Google Analytics, and at first, everything seems very confusing and confusing, but once you start to understand, you will taste and realize how these constructs make life easier and become a powerful tool for working with text data.
In this blog, we will explain in simple language what Regex or regular expressions are, give examples, and show how to apply them in practice. The material will be useful to everyone who is somehow connected with data processing in SEO.
What are regular expressions?
A regular expression (Regular Expression or simply RegExp) is a certain construction for searching for occurrences in a text string.
Using a formal language, you can extract from the text, for example, telephone numbers, email addresses, any pieces of text, and so on.
RegExp is often used by programmers when checking entered data or writing parsers, but SEO specialists also have to deal with regular expressions when working with Google Analytics, and Yandex. Metrics, RewriteRule in .htaccess, or even in text editors for quick search and replacement of strings.
Basics of regular expressions
Let's consider a popular example using regular expressions to configure redirects on a site from the "non-www" version to the www domain.
RewriteCond %{HTTP_HOST} !^www\.(.*) [NC]
RewriteRule ^(.*)$ http://www.%1/$1 [R=301,L]
Regular expressions are in bold here. What do these dots and other symbols mean?
It looks very confusing. And to figure this out, you need to understand RegExp syntax.
«^» — caret, circumflex, or just a tick. Beginning of a line
This character is used to mark the beginning of a line (if not used inside the "[]" construct). For example, if you want to find an email message whose subject line starts with the word "buy", the construct would look simply like: ^buy. Without this character, all keywords containing the word "buy" will be found, not necessarily at the beginning.
For example, you can use this in Google Analytics advanced filters.
You may object: why use regular expressions when you can do without them? In the filters of Google Analytics there is a "starts with" item. We completely agree, and this example was given only to explain the syntax, later we will see that a combination of different constructions performs tasks that are difficult to solve without using regular expressions.
«$» — dollar sign. End of line
Unlike a checkmark, a dollar sign indicates the end of a line. It is already clear that the construction Kyiv will find all phrases ending with the word "Kyiv".
«.» — period. Any character
A period indicates any character, but only one. The dot is rarely used by itself, most often together with other constructions, for example, ". *".
«*» — multiplication sign, asterisk. Any number of preceding characters
The asterisk denotes any number of characters (or groups of characters) written before this sign, including the absence of this character. Together with the preceding character «dot», a convenient construction «.*» is obtained, which means any number of any characters. For example, the expression
RewriteRule ^(.*)$ http://www.%1/$1 [R=301,L]
It is already becoming clearer, here any of the pages are redirected to a new URL.
«+» — Plus. Any positive number of preceding characters.
The plus sign differs from the preceding "*" sign in that the character must occur at least once.
«?» is a question mark. Optional occurrence of the last character
The question mark indicates that the last character or group may or may not occur in the text (i.e., their occurrence is optional).
This is useful when you don't know, for example, whether there will be a slash at the end of the address or not:
^/articles/?$
Or, for example, when you are searching for keywords and taking into account certain errors:
buy stairs?оды
This expression will find all the keywords of your audience where the phrases "buy stairs" and "buy stairs».
«( )» — parentheses. Grouping of constructs.
Similar to their use in mathematics, parentheses in regular expressions are used for grouping. For a group of characters or rules, you can specify other rules.
For example, we need to redirect all users from the subfolder "domain.com/blog/" to the subdomain blog.domain.com:
RewriteRule ^blog/(.*)$ http://blog.domain.com/$1 [R=301,L]
Here, the rule ^blog/(.*)$ means that the address starts with blog/, followed by some sequence of characters (for example, the address of a blog article).
«|» — a vertical line. The "OR" operator.
The vertical line indicates the OR operator when we need to list certain options in the search. Let's say we are looking for keywords that contain the word "buy" or "buy":
buy|buy
Or we want to see statistics for several sections - articles (/articles/) and press releases (/pr/):
^/(articles|pr)/
Or let's take another example. Let's say we want to block the admin, login, register, and some other sections from being indexed by search engines.
In order not to dig into the site code, you can do this with a few lines of code in .htaccess, using the X-Robots-Tag HTTP header, which is most often understood by search engines.
Header set X-Robots-Tag "noindex, nofollow"
The most famous and widely used search engines (Google, Yandex, Bing, DuckDuckGo) most often understand prohibiting directives. But in different countries, sometimes local search engines are used, which may not accept, for example, information about indexing through HTTP headers.
Before optimizing a site for a particular region, it is worth finding out which systems are used by local residents and studying the features of working with them.
«[ ]» — square brackets. Any of the listed characters.
You can list characters in square brackets, and one of them may occur in the text. If the first character in this construction is "^" (hat/tick), then the array works the other way around - the character does not have to match what is listed in the brackets.
In order not to list some popular sequences, such as the entire alphabet or a series of numbers, you can use a range: 0-9 means the range from 0 to 9, a-c means the range of characters from "a" to "c".
Let's say I'm interested in how people found the site when they were looking for explicit instructions (articles starting with "10 best..." or "15 most...").
^[0-9]+
Here we will see that many people requested 301 redirects, but this is not what we were looking for, so in the advanced filter we will exclude everything containing 301.

"{ }" - curly braces. Repeating a character several times.
Curly braces are used to indicate how many times a character or group of characters appears. If two numbers are specified in parentheses, separated by a comma, this will be the interval "from and to".
For example, to find a zip code in the text that is 6 digits long and starts with 14, you can use the following regular expression.
14[0-9]{4}
Here we specified 14, and then a sequence of numbers that is repeated 4 times, together the total length will be 6.
The most complex example:
www\.domain\.[a-z]{2,6}
Finds all domain zones, the main domain, including www.domain.ru and www.domain.travel.
An even more complicated example - we need statistics for 2, 3, and 4-dictionaries separately. To do this, in Google Analytics, in the keyword report, we use the filter:
^[^\s]+(\s[^\s]+){2}$
The "s" construction means a space, which is what separates words. Here [^s]+ indicates that the phrase must begin with any number of indents, then a space, and again some word.
The last two rules "space + word" can occur exactly 2 times (the "() {2}" construction). This way we get a list of all trilogies and statistics about them.
«\» is a backslash. Escaping service characters.
Regular expression syntax uses periods, question marks, and other characters that may also be interesting for searching. In this case, the backslash symbol helps. For example, to search for a period, we escape it - "\.", the same with other characters.
For example, in Google Analytics we have configured one of the goals to use internal search.
A person uses search if they see the "/?q=" construction in the URL.

In the settings it looks like this: "/\?q\=".
There are other symbols for operating with regular expressions, a full list can be found on Wikipedia. But the above should be enough for the basic tasks of an SEO specialist.
Where can an SEO specialist use RegEx?
Google Analytics
Google Analytics is considered one of the main SEO tools.
Google Analytics supports regular expressions, which allows you to create more flexible definitions for filters, goals, segments, audiences, content groups, channel groups, etc.
Very often, analyzing the behavior and paths of users on a site helps to find new and effective methods of SEO site promotion. RegEx can be used to segment the most popular pages and further analyze the popularity of groups of pages.
For example, using RegEx to segment pages allows you to analyze traffic and bounces based on content types on a much larger scale than when using traditional operators.
Google Search Console
Determining user intent is an important task for an SEO, and regular expressions help segment queries based on the primary intent of users, i.e., the reason they are searching for something. This is a crucial component of any digital marketing strategy.
RegEx is most commonly used for branded and non-branded analysis. Using RegEx to specify patterns and match, this can be segmented in a couple of clicks.
You can find more GSC lifehacks in this article:
Google Search Console: How to Add a Site and SEO Lifehacks - Idea Digital Agency Blog
RegEx patterns can be used to segment your audience based on what they were thinking and searching for when they found your site.
You can also use patterns to break down URLs using RegEx filters so you can understand where your traffic is going and what is moving. The intent with which customers find your site is reflected in the page they land on.
Ranking
RegEx can be used to segment rankings based on page types for the highest-ranking URLs for a keyword.
Using the same RegEx patterns as in GSC, it is possible to analyze rankings by keyword segments, for example, how search results show rankings for branded and non-branded keywords.
Crawler Site Audit
RegEx can be used to create patterns that help match a string or text. During a site audit, it can be used to:
Segment crawled pages based on URL patterns to manage crawl analysis for a large group of pages on a corporate site.
Search text from sites during crawling.
Log analysis
Regular expressions also help analyze your site's crawl files by search engines. Log files are usually broken down and analyzed based on the User-Agent for different search engine robots.
Since log files for large sites can contain a large number of pages, using RegEx patterns to segment crawled URLs simplifies the overall analysis and allows filtering based on complex criteria.
Examples of using Regex in SEO tools
Example of using regular expressions in Search Console
For example, we are looking for all mentions of coronavirus in queries:
(?i)([ck]ovid|корон[ао]\s?вірус)
The given query will find all matches without taking into account the case of such phrases as kovid, covid, коронавирус, корона вирус, короно вирус.
Or we are looking for what queries users use to find a gastronomy establishment called fairy house:
(?i)(кафе|ресторан|бар)\s(f[ae]ir[yi]|фе[ий]ри|сказочный)\s?(h[ao]u[sz]e*|хау[зс]|дом)
With this query we will find a bunch of variations of the name from the bar fairy house or cafe fairyhouse to the restaurant fairy house taking into account various possible user errors.
The characters in brackets [ ] are searched alphabetically. In the example above, [ЗС] will work correctly, but [СЗ] will select only options with the letter "с".
Example of use in Google Analytics
For example, if you want to exclude statistics about your employees visiting your site, you can set up a regular expression filter to display all company IP addresses. Let's say this is the range 198.51.100.1 - 198.51.100.25. To avoid entering each of the 25 IP addresses, create a regular expression like 198\.51\.100\.\d* that matches the entire range.
If you need a filter that includes campaign-specific entries from only two cities, you can create a regular expression like Dnipro (Kyiv) (Dnipro or Kyiv).
Closing the WordPress control panel in .htaccess, opening it only for your IP address
Order Deny, Allow
Deny from All
Allow from 200.20.21.145
Where 200.20.21.145 is, for example, your IP address.
We highlight non-brand search traffic
Let's say you have an online store called "goodshop.ru". Using Google Analytics, you would like to separate search traffic for queries that do not contain your store name from branded search traffic.
Tracking the dynamics of changes in non-branded traffic is one way to evaluate the effectiveness of your website SEO optimization efforts. To solve this task, you can create a personal report in Google Analytics with a designated filter that will filter out branded queries. There can be many different spellings of your store name (don't forget about typos and incorrect keyboard layouts). Using a regular expression will eliminate the need to create filter fields for each option.
Regular expression
goodshop|гудшоп|good shop|гуд шоп|good-shop|пщщвырщз
Special characters used:
| - the symbol acts on the principle of a logical OR operator
When forming this regular expression, we simply list all the main possible query options related to the name of your store to exclude them from the report. And don't forget to set the match type to "Regular Expression" when setting up the filter.

Select a specific category of pages on the site
Sometimes, for statistics on visitor interaction with the site content, it is necessary to select a specific group of pages for analysis. For example, compare the engagement rates of pages from a specific section of the catalog. Let's say we have a website selling various electronic gadgets. The site has a mobile phone section with a three-level hierarchy:
Level 1 - the main page of the mobile phone subdirectory:
/catalog/mobile/
Level 2 - collected mobile phones of certain brands:
/catalog/mobile/apple
/catalog/mobile/samsung
/catalog/mobile/htc
3rd level - directly product cards:
/catalog/mobile/apple/iphone5
/catalog/mobile/samsung/galaxys3
/catalog/mobile/htc/desirev
We need to highlight only product card pages in the Google Analytics content report.
Regular expression
/catalog/mobile/.+/.+
Special characters used:
. — denotes any character: punctuation mark, letter, number.
+ - denotes the number of repetitions of the previous character: 1 time or more.
In this case, the combination of special characters .+ denotes any string consisting of at least one arbitrary character. Having a clear site structure, we know that the URI card of the product consists of four fragments, separated by a slash.
We need to specify only the first two of them in the regular expression because we only need pages from the mobile phone section.
However, we know that the mobile phone product card must contain two more parts: brand and model. We specify them using two combinations .+, separated by a slash.
This is how we defined the address template of the product card page, which we copy into the report filter field.

Tracking the execution of target actions on the site
Sometimes regular expressions can be useful when setting up goals in Google Analytics. Let's take as an example the site of a foreign language school that offers its students courses in four foreign languages. The site has an application form that visitors can send indicating the foreign language to study.
At the same time, visitors can choose more than one language as the desired one to study. The project management set itself the task of finding out how often visitors choose more than one language. Accordingly, it is necessary to set up a goal in Google Analytics. After submitting the request, a page with one of the following URIs is displayed.
/order?lang=eng
/order?lang=eng&esp
/order?lang=eng&esp&ita
/order?lang=eng&esp&ita&fra
Obviously, in the goal settings, you need to specify a regular expression that will match the last three URI.
Regular expression
/order\?lang=.{3}&
Special characters used:
{ } — in curly brackets you can specify the number of repetitions of the previous character. Accordingly, the combination of characters .{3} means a sequence of any three characters. In our case, the three characters are the language designation. Don't forget to escape the question mark, which is also a special character in regular expressions.
In this way, the regular expression will match all URIs in which the character is followed by three characters with an ampersand. The specified pages are displayed in cases where more than one language is specified in the submitted form. What needed to be tracked?
How to check that a regular expression is written correctly
To avoid errors when using regular expressions, you can test them before using them by applying an advanced filter in any analytics reports.
You can also check regular expressions on a test sample using special tools. We recommend using the Regex Pal website or the RegExp Tester browser extension.
For those who are just starting to master regular expressions, as well as experienced specialists who want to test their skills in using RegEx constructs, we highly recommend visiting this website. Here you can practice writing regular expressions in a playful way.