Text and location based clustering for twitter data - Repost - open to bidding

  • Estado Cerrado
  • Presupuesto $30 - $250 USD
  • Total de ofertas 12

Descripción del proyecto

I have 6 tweet-datasets each about an event. I want someone to do the following tasks on them.

1. First step is Pre-processing of the data (URL removal, stopword removal, slang removal, POS tagging, duplicate removal, Get geo-coordinates(For this I will provide some help) and spelling correction)

2. Second step is: Cluster the tweets around the most common topics(topic clustering). Assign the topic to each tweet, Store tweets with topic information in a dataframe for step3. This should be unsupervised.

3. Geo cluster the tweets and if most of the tweets in the cluster of some radius are specific to one topic(out of 6 topics as given to every tweet in step 2) then change the topic of this tweet from the one given in step 2 to the most common topic of the geo-cluster.

4. As the dataset is labelled so finally evaluate the given system with a performance matrix for each dataset(event).

Time: 5 days

Obtén cotizaciones gratis para un proyecto como este
Habilidades necesarias

¿Buscas ganar algo de dinero?

  • Fija tu plazo y presupuesto
  • Describe tu propuesta
  • Recibes pagos por tu trabajo

Contrata Freelancers que también oferten en este proyecto

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online