I was looking for a way to identify urls in text and eventually found this huge regex http://daringfireball.net/2010/07/improved_regex_for_matching_urls . I figured I’ll need to do this again so I stuck all that into urlmarker.py and now I can just import it.

import urlmarker
import re

text = """
The regex patterns in this gist are intended only to match web URLs -- http,
https, and naked domains like "example.com". For a pattern that attempts to
match all URLs, regardless of protocol, see: https://gist.github.com/gruber/249502
"""

print(re.findall(urlmarker.WEB_URL_REGEX,text))

will show ['example.com', 'https://gist.github.com/gruber/249502']



blog comments powered by Disqus

Published

16 February 2015

Tags