String matching in Google Sheets /Docs /Apps Script

Here I summarize various ways of matching string patterns in selected Google products: Sheets, Docs, and Apps Script. They are ordered from least to most powerful.

  1. String comparison in Sheets, =if(A1 = "cat", "is", "is not") Perhaps surprisingly, this is not literal comparison, but case-insensitive match: the condition also holds true if A1 has “Cat” or “CAT”. This makes one wonder how to actually test strings for equality; I’ll return to this later.
  2. Substring search: find (case sensitive) and search (case-insensitive).
    Example: =find("Cat", A1)
  3. Wildcards in Sheets functions like countif and sumif: ? matches any single character, * matches any number of any characters.
    Example: =countif(A1:A9, "a?b*")
  4. like comparison in query, the Sheets function using the Google Visualization API Query Language. This is similar to the previous: underscore _ matches any single character, percentage symbol % any number of any characters.
    Example: =query(A1:B9, "select A where B like 'a_b%'").
  5. findText method of objects of class Text in Apps Script, extending Google Docs. Documentation says “A subset of the JavaScript regular expression features are not fully supported, such as capture groups and mode modifiers.” In particular, it does not support lookarounds.
    Example:
    var body = DocumentApp.getActiveDocument().getBody().editAsText();
    var found = body.findText("ar{3,}gh").getElement().asText().getText();

    Yeah… I find Google Docs API clumsy, verbose and generally frustrating; unlike Google Sheets API.
  6. regexmatch function in Sheets and its relatives regexextract and regexreplace. Uses RE2 regex library, which is performance-oriented but somewhat limited, for example it does not support lookarounds. It does support capture groups.
    Example: =regexmatch(A1, "^h[ao]t+\b")
  7. matches comparison in query, the Sheets function using the Google Visualization API Query Language. Supports regular expressions (matching an entire string), including lookaheads but apparently not lookbehinds. Not clear what exactly is supported.
    Example: =query(A1:B9, "select A where B matches 'a.b(?!c).*'").
  8. JavaScript regular expression methods are supported in Apps Script… to the extent that they are supported in whatever version of Rhino JavaScript engine that GAS runs on. For example, non-capturing groups are broken and won’t be fixed. Be sure to test your regexes in Apps Script itself, not in a regular JS environment.

So what about the literal, case-sensitive string comparison in Google Sheets? Apparently, the way to do it is to use regexmatch… Example: =regexmatch(A1, "^cat$") returns True when A1 contains “cat” in lower case.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s