Find Jobs
Hire Freelancers

Crawling a website for data, converting data to standardized datafile

$30-5000 USD

Terminado
Publicado hace casi 15 años

$30-5000 USD

Pagado a la entrega
A website is presenting some numerical data. I want a crawler to extract and transform that data into a standardized data file(s), preferably in Excel format, but if necessary, in CSV or other similar format. ## Deliverables {Please note, this request is related to, but not the same as my other work request # 1188594. Those who may have bid on that project may bid on this one, and bidders on this project may also want to try to bid on the other one.} The website [login to view URL] has a bunch of numerical information on over 80,000 Facebook applications that I want extracted and transformed into numerical format. The final deliverable will be a data file, preferably a single file, but multiple if necessary, that has all of that data. I prefer the file(s) to be in Excel format, but can also be in CSV or other nonproprietary data format. You can obtain a standard login username/password from the site for free. Using that standard login, you would crawl all of the apps, starting from the following anchor page: [login to view URL] Note, I have been informed that there is an unfortunate pagination bug in the website which you can see here: Note that while the anchor page claims to display apps 1-25, it does not actually display apps 1-25. Hence, the spider cannot simply click on "Next" for doing so would actually mean skipping some apps. {If you click on Next manually, you will see what I mean). Furthermore, the number of apps that is skipped seems to be unpredictable and hence you cannot simply crawl using a fixed increment value within the search query. Hence, the spider should be programmed to be smart enough to see the number of apps that were displayed and then construct the proper query to display the true next set of apps, without skipping any. For example, if apps 1-17 are actually displayed (as opposed to apps 1-25 that the site claims to display), then the next query could be: [login to view URL] Basically, you would append the string ?0=x where x = the number of the last application in the previous search page. Or, if you have a better idea, then feel free to use it. What is important is that the crawler not skip any apps. Again, if that is not clear, then playing with the site should clarify the matter. *The Final Output Data I want all data fields, and importantly, all the information from the Javascript graphs that the crawler can see. For example let's consider the Top Friends apps: [login to view URL] With the free standard login, you will see that information in the Summary, Reach, and Audience Profile tabs are available (the info in the Engagement and Growth tabs will be grayed out). From the Summary tab, I want the data regarding: By Company Name (for example, RockYou, Slide, etc.) Rank DAU Social Graph Influence MAU Categories Description The entire Unique Active Users graphs for daily, weekly, and monthly (where x=date, y= UAU) - note, while the graph is Adobe Flash, all of the data is viewable in the Page Source From the Reach tab, I want DAU MAU The entire UAU graph (just like above) From the Audience Profile tab, I want: Male/Female Average Age Average Number of Friends Gender App User Overlap (all of the fields) App User Affinity (all of the fields) Age (all of the categories in the histogram) Social Graph Influence(all of the categories in the histogram) Note, some of the data will be repetitive. I don't care - I just want to make sure that the data is complete, even if some of it is repetitive. Important: many of the apps won't have all of these tabs or all of the fields. If the crawler can't find a tab or field for a particular app, it should just input a "-" string into the data file.
ID del proyecto: 2798068

Información sobre el proyecto

14 propuestas
Proyecto remoto
Activo hace 15 años

¿Buscas ganar dinero?

Beneficios de presentar ofertas en Freelancer

Fija tu plazo y presupuesto
Cobra por tu trabajo
Describe tu propuesta
Es gratis registrarse y presentar ofertas en los trabajos
Adjudicado a:
Avatar del usuario
See private message.
$127,50 USD en 14 días
5,0 (64 comentarios)
5,5
5,5
14 freelancers están ofertando un promedio de $243 USD por este trabajo
Avatar del usuario
See private message.
$191,25 USD en 14 días
4,9 (468 comentarios)
7,5
7,5
Avatar del usuario
See private message.
$127,50 USD en 14 días
4,8 (27 comentarios)
5,3
5,3
Avatar del usuario
See private message.
$85 USD en 14 días
5,0 (67 comentarios)
5,0
5,0
Avatar del usuario
See private message.
$65,45 USD en 14 días
4,5 (91 comentarios)
5,4
5,4
Avatar del usuario
See private message.
$552,50 USD en 14 días
5,0 (58 comentarios)
5,0
5,0
Avatar del usuario
See private message.
$59,50 USD en 14 días
4,7 (63 comentarios)
4,9
4,9
Avatar del usuario
See private message.
$102 USD en 14 días
4,8 (19 comentarios)
4,1
4,1
Avatar del usuario
See private message.
$51 USD en 14 días
5,0 (4 comentarios)
2,7
2,7
Avatar del usuario
See private message.
$170 USD en 14 días
5,0 (3 comentarios)
1,3
1,3
Avatar del usuario
See private message.
$425 USD en 14 días
0,0 (0 comentarios)
0,0
0,0
Avatar del usuario
See private message.
$850 USD en 14 días
0,0 (0 comentarios)
0,0
0,0
Avatar del usuario
See private message.
$170 USD en 14 días
0,0 (0 comentarios)
0,0
0,0
Avatar del usuario
See private message.
$425 USD en 14 días
0,0 (0 comentarios)
0,0
0,0

Sobre este cliente

Bandera de UNITED STATES
Cambridge, United States
5,0
1
Miembro desde mar 4, 2009

Verificación del cliente

¡Gracias! Te hemos enviado un enlace para reclamar tu crédito gratuito.
Algo salió mal al enviar tu correo electrónico. Por favor, intenta de nuevo.
Usuarios registrados Total de empleos publicados
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Cargando visualización previa
Permiso concedido para Geolocalización.
Tu sesión de acceso ha expirado y has sido desconectado. Por favor, inica sesión nuevamente.