Geolocation Twitter

dataCreated layer for creating bounding boxes based on tweet langs2 months ago
databaseChanged so that the database runs layers on all data but only display…2 months ago
layers_methodFixed annoying enviroment varaible2 months ago
miscMerge branch 'master' of github.com:JoelKlint/twitter-geolocation-tra…3 months ago
startupFixa en bugg med relative path i startup/init.sh3 months ago
templatesChanged structure3 months ago
webFixed visualisation2 months ago
.envStarted tracking dependencies3 months ago
.gitignoreAdded script to calculate holes in our capture of data25 days ago
README.mdAdded table and code to filter possible locations3 months ago
calc_holes_in_db_data.pyAdded script to calculate holes in our capture of data25 days ago
create_bounding_box_matrixes.pyFixed accuracy, though to slow to be usefull2 months ago
detect_language.pyAdded a language detector and hopefully fixed the float problem. The …3 months ago
geolocating-twitter-users.pdfAdded the report24 days ago
geonames_api_application.pyRemoved some commented codea month ago
maps-geocode.jsMinor proposal29 days ago
parse_countries.pyPushing latest3 months ago
place_with_contains.pyRemoved some commented codea month ago
preprocess_user_location_in_db.pyAdded a script which uses geonames to connect a user with a geoname e…2 months ago
requirements.txtUpdated dependencies2 months ago
run_algorithm.pyRemoved some commented codea month ago
shell_arguments.pyAdded verbosity support for database class3 months ago
tweepy_stream_listener.pyRemoved some commented codea month ago
twitter_rest.pyRemoved some commented codea month ago
twitter_stream.pyRemoved some commented codea month ago

 README.md

Todo

  • Thread twitter save?
  • Webbinterface

How to setup environment

  1. Set these environment variables
TWITTER_CONSUMER_KEY
TWITTER_CONSUMER_SECRET
TWITTER_ACCESS_TOKEN
TWITTER_ACCESS_TOKEN_SECRET
  1. Install PostgreSQL
  2. Setup database
$ bash database/setup.sh
  1. Install python dependencies
$ pip3 install virtualenv && virtualenv env && source env/bin/activate && pip3 install -r requirements.txt
  1. Start datamining using your filters
bash /startup/init.sh start <filter 1> <filter 2> ...
  1. Check which filters actually allowed by twitter
bash /startup/init.sh status
  1. When done, stop the processes with.
bash /startup/init.sh stop

Antaganden Twitters API

  • retweeted_id --> tweeten är en retweet
  • in_reply_to_user_id --> någon har blivit mentioned
  • in_reply_to_status_id --> tweeten är ett svar på en annan tweet
  • original_tweet_retweet_count --> finns bara på kommenterade retweets
  • En retweet är en "ren" retweet om attributet "original_tweet_retweet_count" inte är null
  • En retweet är en "kommenterad" retweet om attributet "original_tweet_retweet_count" är null

Statements för att hämta ut statistik

SELECT count(*) FROM tweets WHERE retweeted_id IS NOT NULL AND in_reply_to_status_id IS NOT NULL AND in_reply_to_user_id IS NOT NULL;
SELECT count() FROM tweets WHERE retweeted_id IS NOT NULL AND in_reply_to_status_id IS NOT NULL AND in_reply_to_user_id IS NULL; SELECT count() FROM tweets WHERE retweeted_id IS NOT NULL AND in_reply_to_status_id IS NULL AND in_reply_to_user_id IS NOT NULL; SELECT count(*) FROM tweets WHERE retweeted_id IS NOT NULL AND in_reply_to_status_id IS NULL AND in_reply_to_user_id IS NULL;
SELECT count() FROM tweets WHERE retweeted_id IS NULL AND in_reply_to_status_id IS NOT NULL AND in_reply_to_user_id IS NOT NULL; SELECT count() FROM tweets WHERE retweeted_id IS NULL AND in_reply_to_status_id IS NOT NULL AND in_reply_to_user_id IS NULL; SELECT count() FROM tweets WHERE retweeted_id IS NULL AND in_reply_to_status_id IS NULL AND in_reply_to_user_id IS NOT NULL; SELECT count() FROM tweets WHERE retweeted_id IS NULL AND in_reply_to_status_id IS NULL AND in_reply_to_user_id IS NULL;

SELECT ALL FILTERED LOCATIONS:

select user_location, name, ratio, country_code from users inner join filtered_user_locations using (user_id) inner join geonames using(geonameid);