Code archives/Algorithms/deHTML

This code has been declared by its author to be Public Domain code.

Download source code

deHTML by xlsior2008
Function that strips all HTML tags from a string, returning the remaining text without tags
' Basic DeHTML function
' Strips HTML tags And keeps the readable non-HTML portions
'
' By Marc van den Dikkenberg / xlsior
' http://www.xlsior.org
'
' For more extended HTML codes, see: http://www.ascii.cl/htmlcodes.htm
' Note that the extended HTML codes such as the pound symbol are currently ignored.
'


SuperStrict

Local Sample:String="<h1>This</h1><br><b><i> is</i> just</b> some <p>"sample"-html</p> <i>code</i><br><br><br><br>"

Print "Unmodified String: "
Print Sample:String
Print ""
Print "Stripped String: "
Print DeHTML(Sample:String)


Function DeHTML:String(SomeString:String)
	Local dehtmlmode:Int=False
	Local detempstring:String=""
	Local detempcounter:Int=0

	' Check if it's possible that there are any &..; HTML codes -- if so, search & replace them
	If Instr(SomeString,"&")<>0 And Instr(SomeString,";")<>0 Then
		SomeString=Replace(SomeString,"& nbsp ;"," ")
		SomeString=Replace(SomeString,"& quot ;",Chr$(34))
		SomeString=Replace(SomeString,"& amp ;","&")
		SomeString=Replace(SomeString,"& lt ;","<")
		SomeString=Replace(SomeString,"& gt ;",">")
		
	End If


	' Analyze the string for information in between <...> tags, and strip them all
	For detempcounter=0 To Len(SomeString)
		If Mid(SomeString,detempcounter,1)="<" Then
			dehtmlmode=True
		ElseIf Mid(SomeString,detempcounter,1)=">" Then
			dehtmlmode=False
		ElseIf dehtmlmode=False Then
			' Count non-HTML characters
			detempstring=detempstring+Mid(SomeString,detempcounter,1)
		End If
	Next

	Return detempString
End Function

Comments

xlsior2008
NOTE: The web forum messed up some of the HTML codes when I posted this program, so I have to fudge it a little.

the 'replace' commands list some HTML-codes -- these are shown with additional spaces in the code above. Make sure to replace them with the proper values (without the spaces, like: &amp; )

Likewise, the double quotes in the sample string should say &quot; to show the effect.

Remove the spaces for the program to work properly!


Code Archives Forum