https://github.com/ScriptProdigy/CrawlTwitNumbers
Being a twitter user myself, I see a LOT of my friends post their numbers on twitter. So, I had a thought a couple weeks ago to data mine those numbers. I'm using twitters public stream api for data mining the numbers. Twitter provides an extensive API for programmers for all means. They allow data mining and such, its pretty awesome really. Anyways, Check out the git hub for the source and read the readme to start it up yourself. Also, it isn't finished yet but expect one that crawls for basic patterns in the next few hours after this post. Once I get the basics down I'll probably add a bunch of features like logging the location of the person and such.
Results 1 to 7 of 7
- 20 Apr. 2013 07:00pm #1
CrawlTwitNumbers - My Latest Project
- 20 Apr. 2013 07:22pm #2
- 21 Apr. 2013 01:30am #3
You're just grabbing random number sequences from tweets You need to improve on that regular expression pattern, haha.
Also, .gitignore:
Code:*.py[cod] # C extensions *.so # Packages *.egg *.egg-info dist build eggs parts bin var sdist develop-eggs .installed.cfg lib lib64 __pycache__ # Installer logs pip-log.txt # Unit test / coverage reports .coverage .tox nosetests.xml # Translations *.mo # Mr Developer .mr.developer.cfg .project .pydevproject
- 21 Apr. 2013 02:14am #4
- Join Date
- Apr. 2010
- Location
- When freedom is outlawed only outlaws will be free
- Posts
- 5,113
- Reputation
- 195
- LCash
- 0.25
Make it check to see if the number is longer than 10 chars and stuff like that to cut down on false positives. Like, make it grab the entire chunj that has no spaces
(EG if I post 11122233334, your thing will take the entire sequence and reject it because of the extra four. If it sees 1112223333hmuuuu, it will still take in the entire line but only save the numbers because its still the correct amount of numbers)
- 21 Apr. 2013 02:23am #5
- 15 May. 2013 11:34pm #6
- 29 May. 2013 01:51am #7