String matching in Google Sheets /Docs /Apps Script

Here I summarize various ways of matching string patterns in selected Google products: Sheets, Docs, and Apps Script. They are ordered from least to most powerful.

  1. String comparison in Sheets, =if(A1 = "cat", "is", "is not") Perhaps surprisingly, this is not literal comparison, but case-insensitive match: the condition also holds true if A1 has “Cat” or “CAT”. This makes one wonder how to actually test strings for equality; I’ll return to this later.
  2. Substring search: find (case sensitive) and search (case-insensitive).
    Example: =find("Cat", A1)
  3. Wildcards in Sheets functions like countif and sumif: ? matches any single character, * matches any number of any characters.
    Example: =countif(A1:A9, "a?b*")
  4. like comparison in query, the Sheets function using the Google Visualization API Query Language. This is similar to the previous: underscore _ matches any single character, percentage symbol % any number of any characters.
    Example: =query(A1:B9, "select A where B like 'a_b%'").
  5. findText method of objects of class Text in Apps Script, extending Google Docs. Documentation says “A subset of the JavaScript regular expression features are not fully supported, such as capture groups and mode modifiers.” In particular, it does not support lookarounds.
    Example:
    var body = DocumentApp.getActiveDocument().getBody().editAsText();
    var found = body.findText("ar{3,}gh").getElement().asText().getText();
    Yeah… I find Google Docs API clumsy, verbose and generally frustrating; unlike Google Sheets API.
  6. regexmatch function in Sheets and its relatives regexextract and regexreplace. Uses RE2 regex library, which is performance-oriented but somewhat limited, for example it does not support lookarounds. It does support capture groups.
    Example: =regexmatch(A1, "^h[ao]t+\b")
  7. matches comparison in query, the Sheets function using the Google Visualization API Query Language. Supports regular expressions (matching an entire string), including lookaheads but apparently not lookbehinds. Not clear what exactly is supported.
    Example: =query(A1:B9, "select A where B matches 'a.b(?!c).*'").
  8. JavaScript regular expression methods are supported in Apps Script… to the extent that they are supported in whatever version of Rhino JavaScript engine that GAS runs on. For example, non-capturing groups are broken and won’t be fixed. Be sure to test your regexes in Apps Script itself, not in a regular JS environment. Update: GAS now offers V8 runtime, which makes the Rhino issues obsolete.

So what about the literal, case-sensitive string comparison in Google Sheets? Apparently, the way to do it is to use regexmatch… Example: =regexmatch(A1, "^cat$") returns True when A1 contains “cat” in lower case.

Ac-Cent-Tchu-Ate spreadsheet arrays

Given a real number {a}, one can find its positive part {a^+} in multiple ways: by definition:

\displaystyle  a^+ = \begin{cases} a,\quad &a>0 \\ 0,\quad &a\le 0\end{cases}

as a maximum {a^+ = \max(a,0)} or by using the absolute value: {a^+=(a+|a|)/2}

This is boring.

You know what else is boring? Spreadsheets.

Google Sheets, in particular. Let’s suppose we have a bunch of numbers in a spreadsheet (in a range like A1:B3), and want a single-cell formula (using arrayformula) that gets the positive parts of all of them.

 1   -2
 4    0
-1    3

The definition-based approach, =arrayformula(if(A1:B3>0, A1:B3, 0)) does the job:

 1    0
 4    0
 0    3

But it’s kind of redundant: the range has to be typed twice. This can be annoying if you are not just taking positive part once but computing something nested with them: {(1-(2-a^+)^+)^+}. Each application of positive part would double the length of the formula.

The absolute value approach, =arrayformula((A1:B3+abs(A1:B3))/2), has the same issue.

The maximum looks promising, because {\max(a,0)} involves {a} only once. Alas, =arrayformula(max(A1:B3,0)) returns a single number (4 in this example), because max just takes the maximum of all entries.

Is there a spreadsheet formula that returns positive part in array-compatible way, and uses the argument only once?


Yes, there is: =arrayformula(iferror(sqrt(A1:B3)^2, 0))

Negative arguments throw an error at the evaluation of the square root. This error is handled by iferror wrapper, which returns the second argument in this case, namely 0. Otherwise, the value of the square root gets squared, bringing back the input.

This can be nested: for example, the formula

=arrayformula(1-iferror(sqrt(1-iferror(sqrt(A1:B3)^2, 0))^2, 0))

clamps the input values between 0 and 1, performing {\min(1,\max(0,a))}. Yes, it’s long — but if the input is itself a 100-character formula, this is still a simplification over the other approaches.

Re: “Democracy on the high seas” by Colin Carroll

A slightly modified version of the riddle discussed in detail by Colin Carroll. I recap the key parts of his description below. Rule 4 was added by me in order to eliminate any probabilistic issues from the problem.

You are the captain of a pirate ship with a crew of N people ordered by rank. Your crew just managed to plunder 100 Pieces of Eight. Now you are to propose a division of the 100 PoE, and the crew will vote on the division. The captain doesn’t vote except to break a tie. If the proposal fails, the captain walks the plank, and the first mate becomes captain, the third in command becomes first mate, and so on. Each pirate votes according to the following ordered priorities:

  1. They do not want to die.
  2. They want to maximize their own profit.
  3. They like to kill people, including crewmates.
  4. They prefer to be on good terms with the higher ranked crewmates.

The question is: what division do you, as the captain, suggest?

Continue reading “Re: “Democracy on the high seas” by Colin Carroll”