PDA

View Full Version : [5E] sanitize(s) method question



Varsuuk
January 6th, 2021, 05:18
I copied the sanitize method from the 5E ruleset and added it to my StringUtils package.

While testing cases, I tried passing in "Magic-User", expecting it to be converted to my expected classID of "magic_user". Instead the returned string was "magic-user".



-- Used to convert non-xml names to valid xml names.
-- Replaces invalid characters with "_". In addition, converts string to lower-case.
-- @args s String to be scanned for characters to replace.
-- @returns A string with the indicated characters replaced by "_"
function sanitize(s)
local sSanitized = StringManager.trim(s:gsub("%s%(.*%)$", ""));

-- @TODO Review (posted on forums): I added the "-" at start of the character set since "-" was not being replaced by "_"
sSanitized = sSanitized:gsub("[-.,-():'’/?+–]", "_"):gsub("%s", ""):lower();

return sSanitized
end


NOTE that I added the "-" before the "." in order to get it to work, it wasn't there before. I am working on a high-res monitor with smallish print and lo-res eyes... it seems to me there are 2 "hyphens" in that string (ignoring the dupe I added), one before the "(" and one at the end before the "]". It looks like the latter is a teeny bit wider than the former.

It is because one of those is being misinterpreted when inside the "[]" construct as a special character/escape?

It is "working" now, BUT I am sure what I did is not the solution AND it may even be wrong for every OTHER case for all I know, I broke something doing what I did.


More REGEX-y peeps please chime in (remember, in earlier post that I am Regex-challenged.)

damned
January 6th, 2021, 13:00
Probably something to do with magic characters

https://www.lua.org/pil/20.2.html

Varsuuk
January 6th, 2021, 22:18
Yup Damned - you are right :)

I put a "%" in front of the first hyphen and it worked:
sSanitized = sSanitized:gsub("[.,%-():'’/?+–]", "_"):gsub("%s", ""):lower();

The list of "magic characters" is:
( ) . % + - * ? [ ^ $

So I set it to:
sSanitized = sSanitized:gsub("[%.,%-%(%):'’/%?%+–]", "_"):gsub("%s", ""):lower();

Maybe Moon or others can test this and replace it in the 5E code. The last hyphen I didn't know what it was - it DOES look different than the first one so I didn't "escape" it.

But again - not an expert, just noticed it cos I was testing my copy of the 5E method before adding it to my utils package.

damned
January 6th, 2021, 22:25
If you convert the characters to ASCII you will see that the last one is definitely a different character.

Varsuuk
January 7th, 2021, 02:16
Yeah, I can tell visually - just wondering

1) How it is referenced in describing it so I do so in future correctly, for example "dash" or "hyphen" vs a "minus"
2) How do I enter that one on Mac or PC? I assume each of their "special chars" mechanisms, but I think #1 will help once know that.

BUT... you pointed out one very helpful thing too - that since it is NOT a "-" it is therefore not one of the "Magic Numbers" that I need "escape" so it was correct for me not to preface it with "%"

Moon Wizard
January 15th, 2021, 23:12
1) There are three common "dash" types - minus, en dash, em dash.
2) I usually copy the character from a web page about encodings in order to make sure I get the right "dash" I really want when coding.

I'll also fix up the escaping for that function in the next revision of the 5E ruleset.

Regards,
JPG