Post by Chuck PeltoIs anyone familiar with a way to get all the words out of a string as
succinct elements?
Regards,
Chuck
If by 'succinct elements' you mean words separated by spaces, the split
function will do the work.
But if you're working on a text, the words will often be separated by
dots, coma, and other delimiters. You would need to run the split
function too many times to separate sentences (delimited with dots)
into sub sentences (delimited by comas) and then into words (delimited
by spaces). I couldn't find any plug-in to get the words out of a text
in a straightforward and easy way. (Personal note : I'm interrested in
lexicography (statistics applied to texts), if anyone shares the same
interest, please let me know if you know any RB plug-ins or any
application (free and running on a Mac) for that kind of work.)
So if that's what you're looking for, check this code below :
' 1) s is the string with the text you want to extract the words
from. The do...loop replace double spaces by single spaces (It seems
that when you paste a text into an editfield, some spaces or carriage
returns are added). To do that, we need to work with a temporary string
: st.
dim st as String
do
st = s
s=ReplaceAll(s," "," ")
loop until st=s
s=LTrim(s)
s=RTrim(s)
'2) nbChar is the number of chars of the text. It will be useful later.
dim nbChar as Integer
nbChar=s.len()
'3) We extract each word and store it in the array aListeMots, that
needs to be declared first (the french for aListWords, if you wonder).
aListeMotsC1 and aListeMotsCd are two others arrays that store the
position of the first and last char of each word stored in aListeMots.
Maybe you don't need these informations ; in that case, some lines of
code can be deleted. And if you do, you can probably use a single
multidimensional array instead of three, but I wasn't sure how to do
that.
'The string separateurs ('delimiters') is a list of chars that should
be considered as blank spaces, in as much as they separate words. It
includes rc, the RB name for return carriage.
dim separateurs as string
dim rc as string
rc=EndOfLine.Macintosh
separateurs=",.! ?'¡¿:;<>()"+rc
dim i,j,c1,cd,n,nlleLigne as integer
dim vChar,vChar2 as string
redim aListeMots (-1)
redim aListeMotsC1 (-1)
redim aListeMotsCd (-1)
i=1
c1=0
cd=0
'The following loops reads each char, checks if it's a delimiter, and
if it is, fills aListeMots, aListeMotsC1 and aListeMotsCd with the
word, and the positions of its first and last character. A second loop
is within the first one because, when a delimiter is found before a
word, we then need to find the delimiter after this word.
do until i>nbChar
vChar = mid(s,i,1)
if inStr(separateurs, vChar)=0 then
c1=i
j=i+1
vChar2 = mid(s,j,1)
do until j>nbChar or inStr(separateurs,vChar2)>0
j=j+1
vChar2 = mid(s,j,1)
loop
cd=j-1
aListeMots.append mid(s,c1,(cd-c1+1))
aListeMotsC1.append c1
aListeMotsCd.append cd
i=j
else
i=i+1
end if
loop
'The work is done. A famous sentence would give three arrays :
array aListeMots :
The
quick
brown
fox
jumps
over
the
lazy
dog
array aListeMotsC1
1
5
11
17
21
27
32
36
41
array aListeMotsCd
3
9
15
19
25
30
34
39
43
Hope this helps.
Regards,
Octave