apostrophe problem

BlitzPlus Forums/BlitzPlus Programming/apostrophe problem

julianbury(Posted 2007) [#1]
Greetings, Programmers :-)

Filenames with apostrophes (single quotes) in them are giving me a problem.

I maintain a web site which includes news items of interest. The news items are in the form of html pages whose names are also the headlines.

The problem is that if these page names include any single quotes, they fail to appear in the generated index.

I am therefore forced to remove any apostrophes before running my indexing app.

Can you suggest a solution that allows the inclusion of apostrophes without losing the news item?

I am including the source code as a reply.

You might possibly find it useful yourself once this anomaly is fixed, in which case, you are welcome to use it :-)

Thank you for you time and trouble (-_-)

Julian Bury (joolian.net)


julianbury(Posted 2007) [#2]
OK, here is the source:

; ===================================================================================
; 	news.bb   written by Julian Bury
; ===================================================================================
;	This utility expects to find a folder named "files" in the same folder as this program.
;	Within the files folder, it expects to find more folders with date names.
;	Within the date-named folders, it expects find .htm files.
;	It compiles an "index.htm" file of the contents of all the date folders.
; ===================================================================================

Global  csd, fileout, filein, kind, dmax, fmax, i, j, f
Global AppDir$, NewsDir$, file$, folder$, temp$, f1$, f2$
Dim dirname$(1000)
Dim filename$(1000)

AppDir$=CurrentDir$()
NewsDir$=CurrentDir$()+"\files"

If FileType(AppDir$+"\index.htm")=1 Then
	DeleteFile AppDir$+"\index.htm"
EndIf

fileout=WriteFile("index.htm")

WriteLine(fileout,"<!DOCTYPE html PUBLIC "+Chr$(34)+"-//W3C//DTD XHTML 1.0 Transitional//EN"+Chr$(34))
WriteLine(fileout,Chr$(34)+"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"+Chr$(34)+">")
WriteLine(fileout,"<html><head>")
WriteLine(fileout,"<meta http-equiv="+Chr$(34)+"content-type"+Chr$(34)+" content="+Chr$(34)+"text/html; charset=iso-8859-1"+Chr$(34)+" />")
WriteLine(fileout,"<link rel="+Chr$(34)+"stylesheet"+Chr$(34)+" type="+Chr$(34)+"text/css"+Chr$(34)+" href="+Chr$(34)+"index.css"+Chr$(34)+" />")
WriteLine(fileout,"<title>THE NEWS</title></head><body>")

;Make an array of folder names
dmax=0
fmax=0
csd=ReadDir(NewsDir$)

folder$=NextFile(csd)
folder$=NextFile(csd)
Repeat
	folder$=NextFile(csd)
	If folder$>""
		dirname$(dmax)=folder$
		dmax=dmax+1
	EndIf
Until folder$=""
dmax=dmax-1

; Sort the folder names alphabetically
For a=dmax To 1 Step -1
	For b=0 To a-1
		f1$=dirname$(b)
		f2$=dirname$(b+1)
		If Upper$(f1$) < Upper$(f2$)
			dirname$(b)=f2$
			dirname$(b+1)=f1$
		EndIf
	Next
Next

;	get the filenames from each folder and make links to them
For d=0 To dmax
	FileDir$=dirname$(d)
	WriteLine(fileout,Chr(10))
	WriteLine(fileout,"<br /></blockquote><b>"+dirname$(d)+"</b><blockquote>")
	csd=ReadDir(AppDir$+"files\"+dirname$(d))
	fmax=0
	file$=NextFile(csd)
	file$=NextFile(csd)
	Repeat
		file$=NextFile(csd)
		If file$>"" Then
			tx$=Right(file$,3)
			If tx$="htm"
				title$=Left(file$,Len(file$)-4)
				WriteLine(fileout,"<a href="+Chr$(34)+"files/"+FileDir$+"/"+file$+Chr$(34)+">"+title$+"</a><br />")
			EndIf
		EndIf
	Until file$=""
Next

WriteLine(fileout,"<hr /></body></html>")
CloseFile fileout
Notify "News Indexing is Complete"
End



Kevin(Posted 2007) [#3]
Hi julianbury,

Would you be able to convet the apostrophe to the html symbols &#39; so that it can be displayed?

Regards,

Kevin.


julianbury(Posted 2007) [#4]
What troubles me is that sometimes the title with apostrophes works perfectly and sometimes it disappears.

I have not been able to find reasons for this.

(-_-) ???


Vic 3 Babes(Posted 2007) [#5]
Hello, Julian,

I've looked at the code, and can't see how it affects it, but the first thing that occurred to me was that perhaps some headlines use ASCII 96 for the apostrophe - which might explain why it works sometimes and not others - there are 2 apostrophes in the ASCII set.

I can't think of anything else.


xlsior(Posted 2007) [#6]
The first thing that occurred to me was that perhaps some headlines use ASCII 96 for the apostrophe


ASCII 96 is the backtick (AKA 'Grave accent'), not apostrophe... the backtick leans left, while the apostrophe either leans right or is completely vertical depending on the font.


Andy_A(Posted 2007) [#7]
Could you run the problem titles in a loop to show what the ASCII values are of the "apostrophes" that are giving you problems.

If you are grabbing the filenames from a web site then you might also need to filter for these HTML symbols #&145, #&146, #&180, and #&96 (Kevin's link to HTML symbols).

While what xlsior says is correct, it may be any of the aforementioned HTML symbols and still "look" like an apostrophe depending on the font.