PDA

View Full Version : The 'Rules' for string.match



Blackfoot
July 9th, 2015, 00:34
Ok... so I'm working with some code that makes use of string.match and I really need to get a grip on how exactly it works. I've been fudging my way through it up until now with reasonable success but I'd really like to be able to fully understand what all the different codes mean. I found an online reference but it seems a bit incomplete and somewhat incomprehensible in it's explanation.

I'm using string.match to read through the attack line in NPCs.. the code for 3.5 looks like this:

local sDamageRoll, sDamageTypes = string.match(aDamageAttrib[1], "^([d%d%+%-%s]+)([%w%s,]*)");
sCrit, sDamageTypes = string.match(aDamageAttrib[nAttrib], "^x(%d)([%w%s,]*)");
sDamageTypes = string.match(aDamageAttrib[nAttrib], "^%d+%-20%s?([%w%s,]*)");I get that d's are 'all digits'.. and the % is used to identify special characters, w's are 'all alphanumeric characters'... not sure what that means...
Anyway...
In the 1st example...
^ means start at the beginning of aDamageAttrib[1].. The ()'s are a grouping element I think... the []'s are too maybe? The 1st d seems to actually be a d... probably for dice... the second is looking for digits... the %+ and %- are looking for plus or minus signs... if those appear like that are they 'optional'? the %s is looking for a space.. not sure what the + is doing there... my understanding is pretty much starting to disintegrate. :) Help?

darrenan
July 9th, 2015, 01:53
Patterns Tutorial: https://lua-users.org/wiki/PatternsTutorial
Patterns Gory Details: https://www.lua.org/manual/5.2/manual.html#6.4.1

Special characters begin with %, it it doesn't begin with % it's just a literal. So, %d represents all digits, %w is all alphanumeric characters, etc.
Parentheses define 'captures' which are explain in the second link above. Captures become part of the output of string.match, in other words, whatever is matched within the parens is stored in the return value of string.match and can then be retrieved.
+ means one or more repetitions of characters in the set, - means zero or more repetitions.
% is also used to 'escape' special characters. For instance, if you wanted a % character you would use %% since % by itself is a special character.
The [] enclosure is used to combine sets, so for instance [%d%a] would match all digit characters and all letter characters, which is equivalent to %w. Similarly, [%l%u] is equivalent to %a.

Blackfoot
July 9th, 2015, 02:00
I was already using that Patterns Manual page.. the Tutorial seems to help 'some'.. although I'm not sure I can completely explain my examples using it.. I'll keep reviewing it and see if I can figure it using the tutorial..

darrenan
July 9th, 2015, 02:08
What LUA calls "Patterns" are more commonly referred to as "Regular Expressions", C, C++ and C# all use that term. Searching on that term might give you additional helpful links.

Trenloe
July 9th, 2015, 02:18
What LUA calls "Patterns" are more commonly referred to as "Regular Expressions", C, C++ and C# all use that term. Searching on that term might give you additional helpful links.
Be wary of comparing LUA pattern matching to standard regular expressions - they're not the same, there are differences. So don't expect something that works with a regular expression will work with LUA pattern matching, they might but more often than not they won't (especially if the pattern is complex).

Blackfoot
July 9th, 2015, 02:24
So based on what you are saying I'm going to try and break these down..

"^([d%d%+%-%s]+)([%w%s,]*)"
Start at the beginning of the string...
First Capture ... (d<digit>+-<space>) ...
Second Capture ... (<all spaces and alphanumeric characters> ... Not really sure what the comma and * are doing.

"^x(%d)([%w%s,]*)"
Start at the beginning...
First Capture ... (<digits>) after an x
Second Capture ... (<all spaces and alphanumeric characters> ... Pretty much the same as the first one here.. hrm..

"^%d+%-20%s?([%w%s,]*)"
So the first part of this one is the match...
Start at the beginning look for digits with .. I don't quite get the %d+ this is looking for multiple sets of numbers? Then... -20 and a space... not sure I understand the ? either... then the same code again for the capture... yeah.. not quite there yet.

Blackfoot
July 9th, 2015, 03:19
OK.. wait so a comma isn't a special character... so that last capture is capturing multiple alphanumerics with spaces separated by commas... this is starting to make more sense.