I have 6 tweet-datasets each about an event. I want someone to do the following tasks on them.
1. First step is Pre-processing of the data (URL removal, stopword removal, slang removal, POS tagging, duplicate removal, Get geo-coordinates(For this I will provide some help) and spelling correction)
2. Second step is: Cluster the tweets around the most common topics(topic clustering). Assign the topic to each tweet, Store tweets with topic information in a dataframe for step3. This should be unsupervised.
3. Geo cluster the tweets and if most of the tweets in the cluster of some radius are specific to one topic(out of 6 topics as given to every tweet in step 2) then change the topic of this tweet from the one given in step 2 to the most common topic of the geo-cluster.
4. As the dataset is labelled so finally evaluate the given system with a performance matrix for each dataset(event).
Time: 5 days
12 los freelancers están ofertando un promedio de $183 para este trabajo.
I am a data scientist and have experience with machine learning and statistical analysis of data using R and Python. I also have experience with BIg data tech such as Spark and Hadoop. I would like to do the project.
I have experience in twitter data mining, tweet preprocessing, tweet clustering and tweet classification when helping my college in his dissertation.