Computer classes should teach regular expressions to kids

My latest Guardian column is "Here's what ICT should really teach kids: how to do regular expressions," and it makes the case for including regular expressions in foundational IT and computer science courses. Regexp offer incredible power to normal people in their normal computing tasks, and we treat them as deep comp-sci, instead of something everyone should learn alongside typing.

I think that technical people underestimate how useful regexps are for "normal" people, whether a receptionist labouriously copy-pasting all the surnames from a word-processor document into a spreadsheet, a school administrator trying to import an old set of school records into a new system, or a mechanic hunting through a parts list for specific numbers.

The reason technical people forget this is that once you know regexps, they become second nature. Any search that involves more than a few criteria is almost certainly easier to put into a regexp, even if your recollection of the specifics is fuzzy enough that you need to quickly look up some syntax online.

Here's what ICT should really teach kids: how to do regular expressions



    1. It turns out if you click “like” a bunch of times it just toggles the ‘liked’ state, rather than incrementing it upwards. But I’ll keep trying.

  1. I would love to learn how to use regular expressions.  That knowledge would save me an astonishing amount of time but I’ve never seemed to be able to get around to it.

    Can anyone suggest a good (bitesize) book on the subject?  Preferably one which assumes you’re an idiot to start with. :)

    1. Regex – an half hour to learn and a lifetime to master. 

      I’ve found Jeffrey E.F. Friedl’s (O’Reilly & Associates) ‘Mastering Regular Expressions’, to be helpful over the years.  I’ve got the first edition, it’s up to the 3rd, now. 

      There’s also quite a bit of web material available, which includes plenty of useful examples.  Just google regex examples or regular expressions examples.

      I’ve found this useful too, it’s called The Regex Coach. It lets you play with regular expressions out side of a language environment.

    2.  It’s sad that sincerely trying to help people never seems to get many “like”s.  LIKES FOR EVERYONE

    1. Once you learn any version and comprehend the concept, you can zoom to a new regex version with the help of a cheat sheet. Really, as long as you teach any version of regex to kids, they’ll be able to figure the rest out on their own.

  2. I cannot agree with you more. Regular expressions are *fun*. They’re like finding questions to answers you already know: “A US telephone number looks like xxx-xxx-xxxx. How can this be checked?”

    You don’t need to get into complications about different implementations, or how w differs from S, or lookaheads or lookbehinds. Simply being able to represent a harmless structure–a sentence, a title, a pattern–will help sharpen the way the mind operates.

    1. … especially true for lookarounds, which in terms of formal languages are not regular at all. I often found that when I seemed to need those, the problem is better solved by not squeezing it into a single expression.

      1. Completely agree. Frankly I’ve never been able to think in terms of “lookaround / negative lookaround”–I can’t train my brain that way–but maybe, with these kids, there’s hope…!

  3. Regular Expressions are an abomination. Quite useful in the right situation, but so hard to set up and maintain that they should be banned, not inflicted on innocent children. Someone really needs to come up with a user friendly alternative.

    1.  Absolutely.  These computer tasks should be as user-friendly and easy to understand as possible.

      COBOL is absolutely the way to go.  As an added bonus, you can use the resulting programs for book reports!

      1. Some people think that limiting the number of keystrokes, and thus speeding up the typing of a computer program, is the most important aspect of a language. And Regex is nothing if not compact. But other more experienced programmers know better: a useful program is written from scratch once then maintained a hundred times, so understandability (especially if you are not the original programmer) and reusability is key. Regex is non-procedural, non-objectified and uncommentable to name just three reasons it is a nightmare to maintain.

        1. to name just three reasons it is a nightmare to maintain.

          I don’t think that we are talking about use in professionally maintained code here. Something more like a couple of lines which only run once – ever – and are tossed.

        2. Of course there are unreadable and overcomplex uses of regexp, as with every other technology. But just as well there are tasks for which they are the optimal tool, and “optimal” here is more of a fact than a matter of opinion.

    2. Regular expressions are no more of an abomination than arithmetic. They’re both systems of fairly-arbitrary symbols that have specific, formal, actually-pretty-simple when-you-get-down-to-it meanings. People just don’t tend to be as familiar with the meaning of “|” as that of “+”.

      Hell, I’d say regular expressions are actually simpler than arithmetic. In a technical sense, at least — I’m pretty sure you could define an embedding of formal regex in formal arithmetic, but not vice-versa.

    3. I’m not sure there’s a “user friendly” solution to providing a tool set that allows matching of any/all arbitrary possible language patterns. 

  4. But how will they be prepared to be knowledge workers in today’s competitive digital economy if they haven’t been drilled into memorizing the menu of a version of Word that will be a decade old when they graduate?



    Really, the best way to learn regular expressions is to actually use them though.  Find some language that offers a real Regex library (not the bullcrap Posix one) and try out different features until you get the hang of it.  Perl made a name for itself with its built-in regex code, so it is not a terrible place to start.  A simple regex match program might look something like this:


    use warnings;
    use strict;

    my $regex = ‘regex pattern here';

    while ( )
      if ( $_ =~ /$regex/ )
        print(“Not a Matchn”);
    return 0;

    When you run this program, it will tell you if each line you type is matched by your pattern or not.  Edit the pattern and try different things, it’s easy.  There are free perl interpreters for all major platforms (Activestate on Windows, and it ships by default on most flavors of Linux and Mac). 

    Or if you’re already proficient in some language and want to use that instead, be my guest.  Look for “perl compatible” regular expressions if you can, and if the documentation starts talking about patterns that look like [[:xdigit:]] then look somewhere else. 

    A big roadblock to Regular Expressions in the past has been that the default regular expression library in Unix is terrible.  It’s buggy, slow, difficult to use, has very poor diagnostic capabilities, and even worse documentation.  Luckily better Regex libraries are available now and many modern languages use them quite well, but there are always some holdouts that don’t want to break some mythical program that used the weird misfeatures in the terrible old library. 

    1. Bwah-ha-ha!  Yes, PERL is a very powerful language and if you feel the need to create an unreadable syntactic hash it is the language of choice: although APL and SNOBOL could both give it a run for its money! Of course, you can write “secret code” in any computer language but, like anything else you might want to do, PERL makes it so easy.  Probably 90% of the impetus behind Ruby was to create a language of roughly the same power but with fewer opportunities to hang yourself on the grammar. 

      1. And that’s why I bough the first ruby book a decade ago.  Too bad the runtime still sucks.

        It is possible to write bad, old-school perl, but modern perl syntax is quite readable and there are even automated tools to critique style.

        Then again, perhaps we should ban English…

  6. Yes! I can’t agree with this enough.

    In a previous career I was an office drone for a big insurer and I got lumbered with a very tedious task involving updating a spreadsheet with the names of a load of financial advisers. Thanks to a VBA regex I was able to reduce a boring half-hour job to a 1-second one.

      1. If people weren’t willing to disregard the cost/benefit angle now and then we’d all be living in caves eating dirt.
        Keep in mind that previous versions of Excel required substantial tinkering to accomplish things like comparing two lists. 

        1. Actually, if people weren’t paying attention to the cost/benefit angle, we’d all be extinct. You’ve got your causal relationship exactly backwards.

      2. About that. However, this was a weekly job, so after six weeks it resulted in a net saving of time, not to mention the fact that the function I wrote was a lot more accurate and reliable than doing it manually.

      3. RLY?  It was a boring half-hour job that has to be repeated every week.  Or day.  There’s a Rule of 10s:  Do it once by hand; if you need it 10 times, get a script;  if you need it 100 times write compiled code.  (or thereabouts)

        1. Yes, RLY. Because you might note there was no mention of the task being repeated in the original comment, and I’m sure we all know people who gleefully ignore the cost:benefit considerations if it lets them justify playing with the $SHINY while getting paid.

      4.  which will then be re-couped in 6 report runs and forever there after be ‘bonus’ time.

        how is time invested in automating a process EVER seen as a bad thing??

  7. I use this site for reference sometimes

    Totally agree with Cory here.  I see some hairy nasty tears-inducing regexp in the wild but there are a ton of applications out there that would be easily solvable with some simple regexp.

    The way I learnt them is the same way I’ve learnt most computer things over the last 15 years or so: baby steps from square one, start with the easy examples, actually build and execute the regexp somewhere as you are learning, consult a search engine if you run into a little trouble, and respectfully ask in a user group or other online community if you run into a lot of trouble.

  8. Regex has made me look like some kind of dark magic wizard on more than one occasion.  That said, I wish there were more dark magic wizards.  World efficiency would jump up dramatically the minute such a generation entered the workforce.

  9. Even after reading the article, I’m still unconviced (as a self-described “technical person”) that RegEx is something that regular people would find very useful. I think I’ve got a handle on its purpose and application but it’s not something I often want or need outside of the standard uses for validating user input. How will the receptionist implement his RegExp? Where will its output go? How will the receptionist get the output into their Excel spreadsheet? What sort of system is the mechanic using that won’t already let him search for specific part numbers?

    This seems analagous to arguing that everyone should know how to
    write their own programs in case the interface of the one they’re using doesn’t provide the features they need.

    This is the first time I’ve ever seen anyone claim they become second nature with regular use. Mostly I hear things like “arcane”, “confounding” and “bewildering”. Indeed when searching for a good expression for validating things like email addresses, rather than a definitive answer, Google typically shows me arguments over different versions that authors claim will properly handle this or that particular oddball circumstance better than others.

    1. In high school, I worked in at a small town newspaper as basically a data entry drone, and saved days of work all the time with regexes. A high school would send 2,000 graduates’ names, we’d run that for some reason, but the school sent it First Name Last Name and the policy was to run it Last Name, First Name (or vice-versa, who cares?). So I’d be told to copy and paste the names one at a time into the right order, and I’d write a regex instead.

      A shockingly large amount of office work seems to consist of stuff like that: Here’s a large set of partially formatted data, we actually wanted it formatted that way, or we want just these entries, or there’s data in 10 forms we need to normalize into the same format. A lot of the world is too messy for a computer to figure out what needs to be done, but extremely repetitive once you see the particular pattern as a human. I think the average person is more likely to benefit from regexes than programming.

    2. The difficulty in parsing email addresses with regex is indicative of the shocking complexity of email address syntax, not a flaw in regular expressions. The canonical, complete email address regex would be the one here. That may look long and complicated, but it’s probably the concisest possible expression of the email address spec — the spec itself runs to thousands of words.

      As for the receptionist, well — regexes don’t produce output, per se. They’re a way of describing classes of character sequences, so you can search through large bodies of text to see if (and where) any sequences matching the pattern appear.

      For a concrete example for your receptionist: find all the British postcodes (zipcode-equivalent) in a long document. A document so long it would take hours to go through by hand. Or, a few minutes of assembling the regex: /[A-Z]{1,2}d{1,2}[A-Z]?s*d[A-Z]{2}/

      Regex can be applied to search-and-replace too, if you want something you can call output — “find all the strings shaped like this, and replace them with that, where what exactly ‘that’ is depends on what ‘this’ is”. Perhaps your receptionist needs to sort out myriad appearances of “e. e. cummings“, “EE cummings”, “ee. cummings” etc, wherever they appear in a biography… s/[eE].?(s*)[eE].?(s*)[cC]ummings/E.1E.2Cummings/. Done.

      And as for your mechanic, no doubt he can search for specific part numbers without regex. But he might not be able to search for “all parts of the form ASDF-[some digits]-FEB2011″, say, and that could be really handy when it turns out that supplier ASDF has to do a product recall on everything they made in February 2011.

    3. think of it as extremely flexible search and replace, for starters.  lots of word processors have this ability, for example, but many people don’t even know to look for it and instead do a significant amount of shovelwork in a file to change a syntax or something else.  Regex will open your mind to the possibilities of being lazy and saving a load of time through automation.

      I regexped a lot of what I had to do the rest of the day, so I’m off to play Borderlands 2 now.

  10. That’s unexpected – as a french IT engineer, I learnt RegEx three years before graduation. I even implemented them.

    Is it really considered as an advanced concept in US/… ?

    1. I don’t think it’s an matter of being too advanced. My impression is that, right or wrong, it’s viewed as a side issue. It’s a tool like database access libraries that you may or may not use in everyday computing.

    2. I think the article is making the case for teaching regex in computer training for ‘non-technical’ people, who still use computers, but aren’t programmers or engineers. I would imagine US engineering schools teach regex in their computing disciplines. Although I’m a Mech Eng, and I never learned this in my first year CS classes. But then I’m not from the US, either.

  11. Glad to see you mention both typing and RegEx.  Typing is the greatest productivity tool in the IT professional’s toolbox.  At 30 wpm I am usually able to get out of the office an hour sooner than my equally tasked hunt-and-peck colleagues.  Everybody seems to swoon about every new syntax that saves you two characters on the odd line of code in the fantasy that these savings will build up into a general increase in GDP but they won’t take the six orders of magnitude greater increase in productivity that comes with actually knowing how to work the controls on the machine that you spend much of your life in front of.

    RegEx is almost as important.  I’ve lost track of the number of meetings I’ve been in where people cry “We can’t do that!  It would require changing thousands of lines of code!” and I reply “It means one more run of the test suite and 5 minutes of coding.”  The idea that computer programs are text which, since they have to be perfectly syntactically regular in order to be compiled, can be parsed and manipulated by something a bit more automated than a hunt-and-peck typist is always a revelation.

  12. Agreed.  Somewhere there’s an old Slashdot post where I argued that fifth grade was the best time to teach regex concepts.  Kids will surprise you – I had no idea when I took the Rec Dept’s computer class that it was too hard for us fourth graders because they were teaching assembly on a VIC-20 (the memory expansion for BASIC never came in).  I only learned in College CS that assembly was an advanced concept.  So … now I’m straight on that. :P

  13. They should teach them to MS SQL DB analysts too. I was explaining to a couple of DB analysts about how we would remove the number suffixes from 30,000+ addresses in our database (i.e. 3rd ave becomes 3 av) without messing up the street type abbreviations (av, rd, wy, ct). When I said regular expressions I got a blank look. One said “we can do this with find and replace”…. Yeah, good luck with that. 

    1. Are they really called MS SQL DB analysts?  If you’re getting a response like that then they should take out their SQL DB part.  In fact just title them “OFFICE CLERK” would be better.  Anyone who claims they know how to use SQL and don’t understand regular expressions, don’t.
      I hope I misunderstand and they’re just students.

      1.  1. You can do a whole lot with SQL without knowing regular expressions.  I can’t even think of any routine DB admin tasks that require regexp.  Now, calling yourself a “DB analyst” without knowing regexp…that’s a bit of a stretch, admittedly.
        2. You apparently don’t have too much experience with developers educated within the Microsoft ecosystem.  Andy’s account sounds about right.

        1. This probably won’t be read but, meh.

          1) Yes.  You can get away without knowing regexp, and do your job fine on any general DB situations. BUT from my view there’s development away from relationship databases and to systems like Cassandra.  Regular Expression thus are becoming more important.  Not knowing about them shows what type of analyst you are.  Not saying you have to be good with them, at minimal, I’m expecting you to know what they are.

          2) I do.  It still saddens me.  If it’s up to me, in a small company, you won’t get hired or you’re only here in a temporary basis.  If we’re thinking about hiring one, my argument would be hiring someone that knows jack will cost less plus a focus on teamenship evaluations will be less stress on the team.  If it’s a large company, HR and I will be at odds–what else is new?

      2. cservant: Sadly they weren’t students. One is a full time “Senior Systems Analyst” and the other was a temp DB analyst. I refuse to call here a “Senior Systems Analyst” because she knows zip about systems. She’s a very basic DB admin who only knows a very narrow set of MS SQL Server.

  14. An even better idea would be to make basic computer literacy a core part of the primary curriculum. I can’t tell you how many college freshmen I’ve met in the last 3 years who have never so much as opened a browser, sent an email or written an essay in MS Word (yes, yes, I know there are other, better word processing apps out there, baby steps people!).

    Our education system is based on the assumption that the kids these days have magically picked up basic computer literacy via osmosis and so no one bothers to double check and make sure they can actually navigate a GUI with a mouse. Do that, then you could teach RegEx in high school.

  15. Cory, have you ever actually met a member of the ‘unwashed masses’?
    I’d be happy t find a programmer using regular expressions, instead of writing hundreds of lines of buggy code.

    1. Regex is useful, but not for parsing HTML (it’s just not powerful enough).  There is no contradiction or even tension here.

      1. Gawd, I wasn’t implying any contradiction or “tension”. Like any other tool, it has its right and wrong uses.

        Relax, Francis… the link I provided explains itself. Sheesh, step away from the computer, go take a walk in a park or something…

      2. I believe there’s a world-famous StackOverflow on that… … Got it:

  16. I am so with you, Cory! I think that all kids today should be learning 1) a programming language, 2) basic use of Regex, 3) Search engine modifiers for at least one search engine (which can sometimes be combined with regex, depending on the search engine) 4) basic systems administration concepts in an OS agnostic fashion and 5) a broad overview of security and privacy issues with networked computers and all the data in them.

    If you teach these things correctly, you can teach the specifics of one language/regex version/search engine/OS while giving a more general understanding of how things work such that students will be able to learn other versions more easily. Just as learning grammar in any one language makes learning grammar in another language easier, all of these things link together under the surface to build the foundation for ongoing learning as needed.

  17. for my sins, i make a living writing and maintaining back-office code in perl, python, and php. i agree with the #1 commenter — overly enthusiastic use of regexes just lands you with one more problem than you had before. regexes have their place, i do use them myself, but they are just not maintainable — not even for their original author over a sufficient time span.

    kids in programming-related classes should be taught plenty of things they’re not, though. my own favorite: how to use some simple, command-line version control system. SVN or Git or somesuch, doesn’t matter which one. just get them used to the concepts of version history, checking-in and reverting, branching and merging. this’ll be invaluable as soon as they have to collaborate on any programming project, and most of them will see the use of it the second they trash a file by mistake.

  18. While I support teaching computer literacy, it may be that regexp are over the top.

    I would prefer if our grad students had an idea what a for-loop is, and how to avoid it. If you got them there, you can teach regexp.

    Cory, if you can manage to get your message across to the game developers at some major studios, I’d be glad! Last game I had time to play was DeusEx, the first one, once upon a time. There were in-game ‘hacking’ sequences, I remember. They had nothing to do with programming. Let’s fix this, in future. :)

    1.  Yes!  This drives me crazy.  Fallout 3 and Bioshock have the same problem.  Especially when I think of all the fun little problems I’ve done on job interviews; it wouldn’t exactly be hard to find real CS challenges to work into games like Deus Ex and similar.

  19. Also, fifth grade would be a good time to introduce statistics, which rely on not much more than arithmetic. The use of calculus to teach stats seems to be nothing but a way to create an elite class of statisticians.  In fact, many of them live a life like George Jetson, punching the same button a few times a day for 20 years. And anyone that retained all that calculus longer than was needed to pass the midterm and final is going to be like Dustin Hoffman in “Rainman.”

  20. Seeing as we still fail to teach most kids even basic science literacy before they get out of high school (or college, for that matter)   I think we have more pressing concerns than regular expressions.

  21. Hard to tell if you’re joking. Pretty much none of them will remember how to do a regex after a week, even of the few that will understand it. Not unless you make it as much a part of the curriculum as basic language skills. Regex is a true nerd tool — I’m fairly smart (at least enough to understand software development), yet I still have to look up regex or use one of those handy apps on the odd occasions I need to write one properly, and I’m pretty sure I’m not the only one. There is nothing intuitive about it’s syntax. Why not force spaghetti Pearl and Python on them while you’re at it.

    If you want to see which kids are interested in programming, there are better ways to start. Forcing regex in a basic high-school level computer science class is probably enough to put many talented kids off programming for life.

    You do bring up a good point, though. Maybe we should stop teaching them PowerPoint — leave that for the MBA classes.

Comments are closed.