What are regular expressions
Regular expressions (regexp) are a mechanism to validate your text input in BRYTER.
For instance, you can use Regular Expressions to check that your user has entered text in a specific format. Let’s say that you want to create a contract where the name of the parties has to be written using capital letters. You can ask your user to enter the name as a response to your module. Using a Regular Expression, you can check (“validate”) that your user has entered the name in capital letters. That is, you can ensure that your user inputs “JOHN” instead of “John”.
In general, regular expressions can help you to create a comprehensive and complete module. They may seem a bit tricky at first, but this article should help you to get started with building some basic validations for your BRYTER module.
If you don’t have time to learn the mechanism, just copy and paste the examples in your Text input. It is also a good practice to use placeholder text to help your user properly format their information and avoid confusion.
The benefits of using regular expressions
Regular Expressions help you build modules that are error-tolerant. Even if you add placeholder text and tips for your users, they might make errors or skip over your instructions. Using Regular Expressions, you can catch these errors while your user is going through your module so you don’t have to correct them later. This saves you time and resources.
They’re particularly useful if you want to build contracts that require a certain styling, such as only using uppercase letters for names. Of course, you will also want to use placeholder text and info blocks to instruct your users that you want them to input the names with only uppercase letters. However, even if your user fails to, a Regular Expression will make sure that your user followed your instructions.
Example use cases
[ 1 ] Only allow uppercase letters for one word
As mentioned above, you can restrict your user’s response to only use uppercase letters. This is useful when you want to create a contract that requires names to be submitted in all uppercase letters. It’s a good practice to write a placeholder text or tip to instruct your users to only use upper case letters in their response. To do so, enter the following expression:
^[A-Z]+$
You can simply copy and paste this into the appropriate Text input or keep reading for an explanation about the different parts of the Regular Expression.
[A-Z]
This part of the Regular Expression allows the user to enter all uppercase characters from “A” to “Z”. In general, square brackets such as “[ ]” allow you to specify ranges. However, note that this particular range [A-Z] only includes letters used in English. Other uppercase letters, such as “Ä” or “Ü” in German, are not allowed. If you want to include these letters, simply add them to the square brackets as follows:
[A-ZÄÖÜ]
More than one character
+
The plus sign “ + ” following the brackets allows for more than one character to be entered. Without the “+” symbol, only one letter is allowed as a response.
^$
Finally, these two symbols state that the assertion starts from the beginning of the line ^ and should finish at the end of it $. This means that no other characters are allowed in the input other than these specified inside the middle of our regex.
Examples
Using the regex [A-Z]+, allows any input value which includes the characters A-Z, so correct values would be ABC, abC, small Capital and white spaces and an incorrect value would be: abc.
Using the regex ^[A-Z]+$, allows any input value including only A-Z characters, so a correct value would be ABC, an incorrect values would be: abCd, CAPITAL AND WHITE SPACES.
[ 2 ] Only allow uppercase letters for exactly two words
We can extend the code snippet above to check for the same uppercase conditions for exactly two words. That is, from a technical perspective, some characters followed by a space and then some additional characters. Generally, in Regular Expressions, you are creating small validation blocks which you append in a line. You can do this as follows:
^[A-Z]+\s+[A-Z]+
As above, you can simply copy and paste this into the appropriate Text input or keep reading for an explanation about the different parts of the Regular Expression.
[A-Z]+
As discussed in the first example, this snippet allows for more than one uppercase English character.
\s
This allows the user to insert a single space between the two strings of uppercase letters. Using this we accept a “space” between the two uppercase words. In this case, the “ + ” sign allows us to create a chain combining the elements.
^$
If you use these two symbols, nothing else than what was noted inside the regex should appear in the input value.
[ 3 ] Only allow uppercase letters for multiple words
Let’s say your user has a second surname that needs to be entered into a contract. We need a more flexible validation here. The good thing is, we can simplify the code for our validation like this:
^[A-Z\s]+$
As above, you can simply copy and paste this into the appropriate Text input or keep reading for an explanation about the different parts of the Regular Expression.
To sum up: You can put any uppercase letter in square brackets “[ ]”. Using this pattern, you can determine which letters will or will not be allowed to enter in the Text input. There is no determined structure. You are not limiting the number of letters or words here because this is a basic filter mechanism. It also works with special characters.
Cheatsheet
\s
This allows for a single whitespace. Using this, we accept a “space” between words or letters.
\w
This allows for a single character. Using this, we can accept a single letter or a number. We do not accept special characters like “ÄÖÜ” or “!?%”. In that case, you would have to define the character set.
\w+
This allows for multiple characters. Here we can allow a word to be entered without limiting the length of the word.
\d
This allows for a single digit. The user can type in a number with one digit between 0 – 9.
\d+
This allows for multiple digits. The user can enter any kind of number but without a dot or a comma.
[abc]
This allows for any kind of letter or special character like “!?%”. Remember that in this case only one character is allowed to be typed in by the user. This validation is case-sensitive. For example, if you want to support an “a” and an “A” character, your regular expression should look like this: [aA]
[abc]+
This allows for multiple characters.
[a-z]
This allows for any kind of lowercase letter between “a” and “z”.
[a-z]+
This allows for multiple lowercase letters between “a” and “z”.
[A-Z]
This allows for any kind of uppercase letter between “a” and “z”.
[A-Z]+
This allows for multiple uppercase letters between “a” and “z”.
[0-9]
This allows for any kind of digit between “0” and “9”.
[0-9]+
This allows for multiple digits between “0” and “9”.
[^abc]
This forbids any kind of character which is in these square brackets. Using “^” at the beginning of your set or range is reversing the behavior. If you want to forbid a special character like “!?%” then you would type in: [^!?%].
[^abc]+
This forbids characters that are in this range when the user types in multiple characters.
{5}
This specifies how many characters are allowed. If you want a country code that allows only two digits, your regexp will look like: [A-Z]{2}
{1,5}
This specifies the number of characters that are allowed based on a range, such as anything from 1 character to 5 characters.
^$
Remember to use these symbols if you want to assure your regex rules should be applied from the beginning or/and till the end of the input.
It’s worth remembering, that although \d+ regex means “multiple digits” it actually means: “containing at least one digit”. By adding ^ we convert it to “starts with digits only” (^\d+), by adding $ we convert it to end with digits only (\d+$). Finally, by adding them both we convert it to “contains digits only” (^\d+$).
Use case: Build an email validator
Let’s say you want to build an email validator. You want to check that your user has input their email address correctly while they are using your module.
An email address needs to contain an “@” and a dot to be approved. Before the “@” you want to allow a variety of characters but after the “@” and the dot, you will be strict with your validation.
Let’s decompose the email address: john.doe@bryter.io
You start with the first part: “john.doe” Here you allow a set of characters and a dot. Your regexp will look like this:
[\w.]+
Remember, \w allows any kind of character and you need to enter the dot manually. The plus will allow more than one character to be entered.
Then, we want to validate the “@”. This step is pretty straightforward because you just need to use a character range [@]. Again, the plus allows more than one character. Finally, you will add this to your previous validation, so your regexp looks like this:
[\w.]+[@]
Now, we want to check for the email provider, “bryter”. Here, you follow the same step as in step 1. This time, however, we don’t need to add the dot. Simply add a range with the \w which looks like this: [\w]+ Now, your updated regex looks like this:
[\w.]+[@][\w]+
Now, we have to check for the dot. You have to add the dot in this place to keep the order. If you were to add the dot in the step above, you would allow multiple dots after the “@”, which you don’t want. So we have to make an explicit validation here: [.]
[\w.]+[@][\w]+[.]
Finally, you want to add the top-level domain: “io”
Finally, you want to add the top-level domain: “io”. Here you only want to allow for a limited set of characters, so you will use:
[a-z]+
This is your final regexp:
[\w.]+[@][\w]+[.][a-z]+
Now you can test it on your own and type in multiple email addresses. You can adjust it to your needs as well. If it should allow special characters at the beginning like “ÄÖÜ”, just add them in the first step within the range indicated in the square brackets.
If you want to specify that the input should not contain anything but the email you should provide the symbols ^$ as well:
^[\w.]+[@][\w]+[.][a-z]+$
Use case: Validate a flight number
Let’s say you build a flight right module that helps people claim compensation for delayed or canceled flights. In the end, a contract will be created which they can send to the airline. A part of that contract is the correct flight number. To avoid errors, validate the flight number input with a Regular Expression.
A flight number contains a two-character airline designator and a 1 to a 4-digit number.
Let’s decompose the flight number: LH 3442
First, validate the first part: “LH”. You need to limit the input to two uppercase characters. Start by specifying a range within square brackets: [A-Z]. Now you need to allow only two characters. To set the number of characters, use the curly brackets and type in the number. In this case, it is {2}. So far, your Regular Expression looks like this:
[A-Z]{2}
After the two first characters, you will need to add a space using \s. Your Regular Expression should now look like this:
[A-Z]{2}\s
Now we need to validate the flight number. We need to allow this to be a 1 to 4 digit number. Remember that for digits, we must use the [\d] expression. Finally, you need to add a range that allows 1 to 4 digits. In this case, you will append {1,4}. The comma acts like a “to”, so the Regular Expression is:
[\d]{1,4}
Your final regexp is:
[A-Z]{2}\s[\d]{1,4}
Again, if you want to specify that the input should not contain anything but the flight number you should provide ^$ symbols as well:
^[A-Z]{2}\s[\d]{1,4}$
If you want to learn more about Regular Expressions, you can find plenty of good online resources to study as well as examples to copy from.
Keywords: boolean; regex