| |||||||||||||||||||||||||||||||||||||||||||||||||
Someone of the CFDJ list recently had a problem that went something like this... he had a bunch of user-created HTML locked up in his database. When rendering the HTML, he needed to be able to pull out all the <h*> elements and use their contents to populate an @id attribute. He then wanted to add an ordered list to the top of the page, containing links to those IDs.
So he had this:
... <h1>This is a heading</h1> <p>This is a paragraph</p> <h2>This is another heading</h2> <p>This is another paragraph</p> ......and wanted to turn it into this:
... <ol> <li> <a href="#This-is-a-heading"> Heading 1 </a> </li> <li> <a href="#This-is-another-heading"> Heading 2 </a> </li> </ol> <h1 id="This-is-a-heading"> This is a heading </h1> <p>This is a paragraph</p> <h2 id="This-is-another-heading"> This is another heading </h2> <p>This is another paragraph</p> ...A number of folks suggested a series of ever-more-complicated regular expressions to address the problem, but I've found it much easier to just put jTidy and CFMX's native XML tools to work.
- Right up front, go download a copy of jTidy and drop it into your classpath.
- Borrow Greg's makexHtmlValid() function and add it to a component called jtidy.cfc.
- For parsing insurance, add jTidy.setXmlOut(true); to the function, right after the other jTidy.set* statements.
- In the same directory with jtidy.cfc, create this file:
<cfsavecontent variable="page"> <h1>Heading Number <i>1</i></h1> <p>this is a paragraph</p> <h2>Heading Number 2</h2> <p>Another paragraph goes here.</p> </cfsavecontent> <cfinvoke component="jtidy" method="makexHtmlValid" strtoparse="#page#" returnvariable="content" /> <cfset myxml = XmlParse(content) /> <cfset myheadings = XmlSearch(myxml, "//*[starts-with(name(),'h') and string-length(name()) = 2]") /> <cfdump var='#myheadings#'> <cfloop index="i" from="1" to="#ArrayLen(myheadings)#"> <cfset dummy = ToString(myheadings[i]) /> <cfset dummy = REReplaceNoCase(dummy, "<#myheadings[i].xmlname#[^>]*>","","ONE") /> <cfset dummy = ReplaceNoCase(dummy, "</#myheadings[i].xmlname#>", "", "ONE") /> <cfset dummy = Replace(dummy, " ", "-", "ALL") /> <cfset myheadings[i].xmlattributes.id = Trim(REReplace(dummy, "<[^>]*>", "", "All"))> </cfloop> <cfdump var="#myxml#">To output the results as HTML, just use:
<cfoutput>#ToString(myxml)#</cfoutput>In my opinion, the result is easier to understand, and a whole lot more flexible than using a regex or three. How about you?
UPDATE: I was missing a Trim() in the id-setting code, which caused unwanted artifacts to show up in the final HTML.
12-29-2004 05:53:44PM - Permalink - Comment [7] - Trackback
category: XML
related topics: (CFMX) (XHTML) (jTidy) (CFDJ) (list)