Discussion:
Cases vs. StringCases vs. Select and StringMatchQ vs. StringFreeQ
(too old to reply)
David Latin
2010-07-22 09:42:02 UTC
Permalink
Hello,
I am currently working on manipulating data in "vCard"-like format, and have
become confused by the actions of the Cases, StringCases and Select
functions.
Consider the small list:

In[1]:= list = {"DTEND:19260412T175900", "DTEND:20070207T050000",
"END:VCALENDAR", "MM"} ;

In[2]:= Cases[list, ___~~"END:"~~___]
Out[2]= {}
So pattern-matching obviously does not work with Cases for a list of
strings.

The documentation for Cases does not refer to patterns in strings, so I
tried

In[3]:= StringCases[list, ___~~"END:"~~___]
Out[3]=
{{"DTEND:19260412T175900"},{"DTEND:20070207T050000"},{"END:VCALENDAR"},{}}
The problem here is that empty elements can be returned.

So next I tried

In[4]:= Select[list, ___~~"END:"~~___]
Out[4]= {}
Obviously not working.

Next I tried

In[5]:= Select[ list, StringMatchQ[#, "*END:*"] & ]
Out[5]= {"DTEND:19260412T175900", "DTEND:20070207T050000", "END:VCALENDAR"}

This is fine.
But what if I only want the "END:" lines and not the "DTEND:" lines ?

It may be appropriate to make use of

In[6]:= Select[ list, StringFreeQ[#, "*DTEND:*"] & ]
Out[6]= {"DTEND:19260412T175900", "DTEND:20070207T050000", "END:VCALENDAR",
"MM"}
Not as expected!

But, in the end, what works is:

In[7]:= Select[ list, StringMatchQ[#, "*END:*"] && ! StringMatchQ[#,
"*DTEND:*"] & ]
Out[7]= {"END:VCALENDAR"}

I know I could have used "END*" instead of "*END*", but that's not the point
here.

My questions then are:
Why doesn't Cases work for a list of strings ?
Why doesn't Select work for patterns with the ~~ operator ?
Why doesn't StringFreeQ act in the same way as !StringMatchQ ?

Any help over this confusion would be very much appreciated!
Thank you,
David
Bill Rowe
2010-07-23 11:11:57 UTC
Permalink
Hello, I am currently working on manipulating data in "vCard"-like
format, and have become confused by the actions of the Cases,
In[1]:= list = {"DTEND:19260412T175900", "DTEND:20070207T050000",
"END:VCALENDAR", "MM"} ;
In[2]:= Cases[list, ___~~"END:"~~___] Out[2]= {}
So pattern-matching obviously does not work with Cases for a list of
strings.
Patterns and string patterns simply aren't the same. So, do

In[12]:= Cases[list, _?(StringMatchQ[#, ___ ~~ "END:" ~~ ___] &)]

Out[12]= {DTEND:19260412T175900,DTEND:20070207T050000,END:VCALENDAR}
The documentation for Cases does not refer to patterns in strings,
so I tried
In[3]:= StringCases[list, ___~~"END:"~~___] Out[3]=
{{"DTEND:19260412T175900"},{"DTEND:20070207T050000"},{"END:VCALENDAR
"},{}}
The problem here is that empty elements can be returned.
That is easily fixed by doing either

In[13]:= DeleteCases[StringCases[list, ___ ~~ "END:" ~~ ___], {}]

Out[13]= {{"DTEND:19260412T175900"}, {"DTEND:20070207T050000"},
{"END:VCALENDAR"}}

or

In[14]:= StringCases[list, ___ ~~ "END:" ~~ ___] /. {} -> Sequence[]

Out[14]= {{"DTEND:19260412T175900"}, {"DTEND:20070207T050000"},
{"END:VCALENDAR"}}
So next I tried
In[4]:= Select[list, ___~~"END:"~~___] Out[4]= {}
Obviously not working.
Here, like with Cases a pure function using StringMatchQ will do
what you need. That is,

In[15]:= Select[list, StringMatchQ[#, ___ ~~ "END:" ~~ ___] &]

Out[15]= {DTEND:19260412T175900,DTEND:20070207T050000,END:VCALENDAR}
Next I tried
In[5]:= Select[ list, StringMatchQ[#, "*END:*"] & ] Out[5]=
{"DTEND:19260412T175900", "DTEND:20070207T050000", "END:VCALENDAR"}
This is fine. But what if I only want the "END:" lines and not the
"DTEND:" lines ?
Change the pattern to be matched. For example,

In[16]:= Select[list, StringMatchQ[#, "END:" ~~ ___] &]

Out[16]= {END:VCALENDAR}
It may be appropriate to make use of
In[6]:= Select[ list, StringFreeQ[#, "*DTEND:*"] & ] Out[6]=
{"DTEND:19260412T175900", "DTEND:20070207T050000", "END:VCALENDAR",
"MM"}
Not as expected!
Since StringFreeQ[string, pattern] returns true when a substring
of string matches pattern, it isn't sensible to supply a pattern
like ___~~pattern~~___. This just causes Mathematica to do more
work than needed to achieve the desired result. So, do

In[17]:= Select[list, StringFreeQ[#, "DTEND:"] &]

Out[17]= {END:VCALENDAR,MM}

Also, note the documentation for StringMatchQ under more
information states "... ordinary StringExpression string
patterns, as well as abbreviated string patterns containing the
following metacharacters:" and specifically states a "*" is
interpreted as zero or more characters. The documentation for
StringFreeQ does not have any similar statement. So, I suspect
for StringFreeQ, an "*" is taken to be a literal asterisk. Since
none of strings in your list have a literal asterisk, all would
be selected if StringFreeQ is interpreting the "*" at the end of
you patterns as a literal asterisk.
In[7]:= Select[ list, StringMatchQ[#, "*END:*"] && ! StringMatchQ[#,
"*DTEND:*"] & ] Out[7]= {"END:VCALENDAR"}
I know I could have used "END*" instead of "*END*", but that's not
the point here.
My questions then are: Why doesn't Cases work for a list of strings
? Why doesn't Select work for patterns with the ~~ operator ?
Neither Cases nor Select is designed to use string patterns. You
can use string patterns with these by creating a pattern or
function that will evaluate to true or false using any of the
functions that do accept string patterns as arguments.
Why doesn't StringFreeQ act in the same way as !StringMatchQ ?
Why are you expecting these to be the same? StringFreeQ[string,
pattern] returns true whenever no substring of string matches
pattern. !StringMatchQ[string, pattern] returns true whenever
the entire string fails to match pattern. There is a clear
difference between matching a substring of a given string and
the entire string.

Loading...