Extract Legitimate Links or URLs from Collection using vb.net

Good morning all!

I had a quiet evening. My youngest daughter called from camp again and is still begging to come home. Jean is suppose to go get her today but we shall see. The nurse said she is having fun so who knows what might happen.

I spent a good portion of the evening organizing my music files for my Zune. To this point I have just thrown everything into a single playlist and hit the shuffle button. But yesterday I went out and bought an exercise machine (no nothing to do with me turning ….. 29 …. LOL) and realized I would need a playlist for music that would go well with a workout. (By the way if you have suggestions for that kind of music I would love to hear them) so I began organizing them into different playlists. Wow you never realize how much music you have till you start going through it all! As I was playing snippets of lots of the songs of course Jean just looked at me like I was crazy. Everything from the Beastie Boys to Streisand………..I know I know fit me for a strait jacket!

Anyway, the task before me, which seemed simple, was to get all legitimate links from a collection. As you know, you can get a lot of crap links when looking in your Internet history that start with "JavaScript" or other stuff you don’t necessarily want to see. So using regular expressions I came up with something. we don’t even need to compare case! It isn’t a lot of code but it was actually quite challenging. So here goes…….Make it a great day!

Public Sub ExtractLinks(ByVal sText As String)
        Dim r As New Regex("(?i)href=\s*\x22*(?<save>.+?)[\s>\x22]")
        Dim mc As MatchCollection = r.Matches(sText)
        Dim m As Match
        For Each m In mc
            Dim s As String = m.Groups("save").Value
            If s.IndexOf("#") <> 0 Then
            End If


  1. #1 by Frank on July 24, 2008 - 9:37 am

    First comment using C#.  The implementation above is correct, but I like using IEnumerable if I can.  It gives me a way make a call to get multiple data items while keeping my user interface thread flowing.  I use a lot of threading so most if not all of my data is being returned from threads to a main UI so I don\’t like making a call where there is a lot of churning before I get everything at once.  Gives me a way to, for example, respond to a "Cancel" button.
    IEnumerable and the "yield" statement gives me a way to handle one at a time.
    A good explanation of yield is here: 
    Here is my version of the above code (in C# of course) and how I would implement it in an app:
    In some Utility function assembly:
    public static IEnumerable<string> ExtractLinks(string sText)
        Regex r = new Regex("(?i)href=\\s*\\x22*(?<save>.+?)[\\s>\\x22]");
        MatchCollection mc = r.Matches(sText);
        foreach(Match m in mc)
            string s = m.Groups("save").Value;
            if(s.IndexOf("#") != 0)
                yield return s;
        r = null;
    In mainForm:
    void DoLinks()
        foreach(string link in Utilities.ExtractLinks(somehumongouslinkstring)
            // handle your UI functions here (Application.DoEvents(), Cancelled check, whatever).
            // I stay away from DoEvents() but its personal preference
    Just some thoughts, not criticism.  Always good to learn stuff.

  2. #2 by Kelly on July 24, 2008 - 9:47 am

    No offense taken of course! A very good alternate way to do it! Thank You!

  3. #3 by Tharaa Krishna on September 27, 2012 - 1:46 pm

    Thanks for this article, this was just what I needed to read!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: