Thread: [5E] sanitize(s) method question
-
January 6th, 2021, 05:18 #1
[5E] sanitize(s) method question
I copied the sanitize method from the 5E ruleset and added it to my StringUtils package.
While testing cases, I tried passing in "Magic-User", expecting it to be converted to my expected classID of "magic_user". Instead the returned string was "magic-user".
Code:-- Used to convert non-xml names to valid xml names. -- Replaces invalid characters with "_". In addition, converts string to lower-case. -- @args s String to be scanned for characters to replace. -- @returns A string with the indicated characters replaced by "_" function sanitize(s) local sSanitized = StringManager.trim(s:gsub("%s%(.*%)$", "")); -- @TODO Review (posted on forums): I added the "-" at start of the character set since "-" was not being replaced by "_" sSanitized = sSanitized:gsub("[-.,-():'’/?+–]", "_"):gsub("%s", ""):lower(); return sSanitized end
It is because one of those is being misinterpreted when inside the "[]" construct as a special character/escape?
It is "working" now, BUT I am sure what I did is not the solution AND it may even be wrong for every OTHER case for all I know, I broke something doing what I did.
More REGEX-y peeps please chime in (remember, in earlier post that I am Regex-challenged.)
-
January 6th, 2021, 13:00 #2
Probably something to do with magic characters
https://www.lua.org/pil/20.2.html
-
January 6th, 2021, 22:18 #3
Yup Damned - you are right
I put a "%" in front of the first hyphen and it worked:
sSanitized = sSanitized:gsub("[.,%-():'’/?+–]", "_"):gsub("%s", ""):lower();
The list of "magic characters" is:
( ) . % + - * ? [ ^ $
So I set it to:
sSanitized = sSanitized:gsub("[%.,%-%(%):'’/%?%+–]", "_"):gsub("%s", ""):lower();
Maybe Moon or others can test this and replace it in the 5E code. The last hyphen I didn't know what it was - it DOES look different than the first one so I didn't "escape" it.
But again - not an expert, just noticed it cos I was testing my copy of the 5E method before adding it to my utils package.
-
January 6th, 2021, 22:25 #4
If you convert the characters to ASCII you will see that the last one is definitely a different character.
-
January 7th, 2021, 02:16 #5
Yeah, I can tell visually - just wondering
1) How it is referenced in describing it so I do so in future correctly, for example "dash" or "hyphen" vs a "minus"
2) How do I enter that one on Mac or PC? I assume each of their "special chars" mechanisms, but I think #1 will help once know that.
BUT... you pointed out one very helpful thing too - that since it is NOT a "-" it is therefore not one of the "Magic Numbers" that I need "escape" so it was correct for me not to preface it with "%"
-
January 15th, 2021, 23:12 #6
Supreme Deity
- Join Date
- Mar 2007
- Posts
- 20,421
1) There are three common "dash" types - minus, en dash, em dash.
2) I usually copy the character from a web page about encodings in order to make sure I get the right "dash" I really want when coding.
I'll also fix up the escaping for that function in the next revision of the 5E ruleset.
Regards,
JPG
Thread Information
Users Browsing this Thread
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks