Lately I’ve been working with Rich Text Editor so I found that I need more than what RTE offers. Recently I needed a function to strip HTML tags. If you have a PHP background you can remember the strip_tags function from PHP4 and PHP5 where you could strip all HTML tags but the ones you specified as allowable tags.
My function receives two string parameters:
- html: the string with the HTML
- tags: a string with allowable tags separated by comma
In a few words I’ll explain what it does step by step:
- split the allowable tags string by any kind of comma separation
- remove empty tags or spaces
- search for tags in the HTML string
- if found add it to “to be removed array”
- remove the tags from HTML string
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | public static function stripHtmlTags(html:String, tags:String = ""):String { var tagsToBeKept:Array = new Array(); if (tags.length > 0) tagsToBeKept = tags.split(new RegExp("\\s*,\\s*")); var tagsToKeep:Array = new Array(); for (var i:int = 0; i < tagsToBeKept.length; i++) { if (tagsToBeKept[i] != null && tagsToBeKept[i] != "") tagsToKeep.push(tagsToBeKept[i]); } var toBeRemoved:Array = new Array(); var tagRegExp:RegExp = new RegExp("<([^>\\s]+)(\\s[^>]+)*>", "g"); var foundedStrings:Array = html.match(tagRegExp); for (i = 0; i < foundedStrings.length; i++) { var tagFlag:Boolean = false; if (tagsToKeep != null) { for (var j:int = 0; j < tagsToKeep.length; j++) { var tmpRegExp:RegExp = new RegExp("<\/?" + tagsToKeep[j] + "( [^<>]*)*>", "i"); var tmpStr:String = foundedStrings[i] as String; if (tmpStr.search(tmpRegExp) != -1) tagFlag = true; } } if (!tagFlag) toBeRemoved.push(foundedStrings[i]); } for (i = 0; i < toBeRemoved.length; i++) { var tmpRE:RegExp = new RegExp("([\+\*\$\/])","g"); var tmpRemRE:RegExp = new RegExp((toBeRemoved[i] as String).replace(tmpRE, "\\$1"),"g"); html = html.replace(tmpRemRE, ""); } return html; } |
Usage:
stripHtmlTags(myInput.htmlText, "b, i, u")
This will remove all tags that are not bold, italic or underline. See the following working example.
Tags: ActionScript, HTML, Rich Text Editor, RTE
This post was written by Andrei Ionescu
Views: 11771










thanks a lot : )
Small oversight:
This code won’t work as is without the second parameter.
If the second paramter is null, it will crash on the 2nd line where you check “tags.length”. Easy fix is to make the 2nd parameter default to an empty string (“”), which will have a valid length.
Thanks JohnG! I’ve modified the code also.
Ive been playing with the code but can’t seem to get it to keep normal html code. It just strips it to regular text. Here is what I had:
stripHtmlTags(inputField.htmlText,",<B>,</B>");Can you give me an example of how to use it correctly.
Thanks.
Hello Brandon! Use it like this:
stripHtmlTags(inputField.htmlText, "b");Is no need to put “<”, “>”, or upper case. Neither to specify the ending tag. I’ve made an update with an example of how to use it.
Excellent piece of code, thanks a lot.
I was looking for a solution in C# and came across this.
So I converted your code to work in C# and the dotnet framework.
If anybody is interested please let me know.
Cheers!
Pelotudo, do you mind sharing the c# code please. Thanks!
Hello, Donny! Pelotudo gave me the code but didn’t had time to post it. I’ll post it soon.
Save me a good hour or so from having to write a similar function myself! Thanks!
Thank you so much
just what I was looking for
legend! this helped me out of a tight spot, i was trying to get data from an arrayCollection in a DataGrid and put it in a SQL database. The output had XML tags. This works a treat…
Thanks for this really useful function…
This is awesome m8! thnx very much!
Helps when using the mailto: urlRequest since taged htmlText doesn’t display nicelly or at all!
I love you!
I soooo needed that piece of code!
Thanks
the function doesn’t work in that if you specify i as an allowable tag, it also allows iframe because the regular expressions see the i in iframe and consider it safe. this is incredibly insecure. i can’t believe in a year and a half nobody noticed this.
Here I fixed it http://fightskillz.com/2010/01/flexactionscript-3-0-strip-html-tags-function/
Thanks Yoav! I’ll fix my code too.
Hello Yoav! Just fIxed my code and created a working application. Tested your solution and found that sometimes it doesn’t remove the closing tags (</iframe>, for example). So I found another better way fixing the bug you discovered. Thank you very much for your real help.
Glad I could help evolve it. The changes you made are great because it lets you specify p as an allowable tag while using attributes like style=”color:#0000FF;”, but in the specific application I’m using it in I need stricter control. I can’t reproduce the closing tags not being removed issue though.
My changes still work for me in cases where you just want certain bare tags to get through. I guess it depends on the situation. Thanks for letting me know, also the working example you posted is very useful.
Hi!
I found a strange behaviour in the code:
If you try to strip the ‘a’ tags in the following String
you will get
It’s because of the “?” character I think, but so far I have no solution for it. I will post back if I find one.
Cheers
Got it: replace line 58 with this
Thanks for your fix. Jörg. You mean line 58 in the whole source file but I guess it is line 36 in the code presented in the article.