Find Jobs
Hire Freelancers

Coder Needed For Complex Web Scraping Script

$100-500 USD

Cancelado
Publicado hace más de 14 años

$100-500 USD

Pagado a la entrega
This will be a multi part script that will: 1. record project name and data field 2. learn data locations via a web based inter active script 3. retrieve data automatically 4. report any errors in retrieval process The scraped data will need to be incorporated into a mysql database for data extraction by an existing website. New pages will be need for this as a secondary project. ## Deliverables I need a complex webscraper built for me. I say complex because it will be required to pull data from many web sites of different layouts. The first task for the winning bidder will be to create an input file of urls from my existing database. Each of these urls will be the home page of one of the sites we will be collecting data from. This input file creation should be very simple as my current website displays these urls on one of my pages. This file will contain 2 pieces of data: the websites unique number and the url of the websites home page. There will be 4 parts to actual scraper script. The first will part will work with a user to name a project and all of the data fields that will need to be captured. For my first project with the script that you will build there might be 8 to 12 pieces of data that will need to be collected from each site and they may recide on multiple pages. Each of these data fields will need to have a unique name given to them. So, I might call the project "toy prices" and the 8 data fields might be, "mattel-truck", "Hess-truck", "dump-truck", etc, etc. . The second part of the script will work as web based interactive program. In this part of the script each data fields location at every website in the input file (both url and exact location on the page) will be recorded by the script with the help of a user. The script will start by reading from the input file of urls one at a time and display the home page of the 1st site in a work box on the users screen. By "work box" I mean that part of the screen will be for the user to communicate to the script (like the header and left hand column) while the rest of the screen will show the actual website url data screen. The user will then go through each of the data fields needed from this site one by one and define the url and exact page location on the screen so that the script can record this information of each of the fields for later automatic retrieval in part three of the script. In order to do this the user must be able to change the url (navigate from the home page) to get to the proper url where the data resides. The user will select each of the data fields (maybe they will all show on the left hand column of the users screen) one at a time and then highlight (select) the data field on the website. From the users highlighting of the data field the script must be able to record each data fields exact position so that in the end: For every data field at every website we want to collect data from, the script will learn and create a record. The record layout will look something like this: Positons: 1-6 website unique number 7-29 data-field-1-name 30-60 data-field-1-name-description (text/decimal/size) 61-90 data-field-1-name-url 91-119 data-field-1-name-page-location (starting row/column) 121-130 current date of data collection 131-140 exact time of data capture 141-150 data-field-1-data 151-180 error-message-if-any blank if none So, if there were 1,000 websites to collect data from and 8 pieces of data to collect from each we should have 8,000 records in the project file that shows the exact location of of piece of data and the data itself along with any error message there might be if the data could not be collected. I.e. The url was no good or the data was supposed to be decimal but the script found text... etc. All of these 8,000 records will be recorded/written during the user interactive second section of the script. Also, with this file you can see how we could selectivly go out and scrape the data for just one website or go to every website and just gather data-field-2 from all of them or... etc. etc. It will be able to do this because in section three of the script, the automated retrieval of the data, it will first read an input record that will contain the information it will use to determin exactly what to do. This auto update section will need to be a cron type job. The third section auto update record will look something like this: Position: 1-9 starting website number 10-20 ending website number 30 If postion 30 has a 1 in it then get data-field-1 if it's zero do not 31 If postion 31 has a 1 in it then get data-field-2 if it's zero do not 32 If postion 32 has a 1 in it then get data-field-3 if it's zero do not 33 " 34 " 35 36 37 38 All the way through data-field-8 From this record we can see that if the starting website number is 1 and the ending number is equal to the last website and all of the datafield characters are set at 1 then the script will go and retrieve all 8 data-fields from all 1000 websites. The fourth section of the script will be the exception reporting. During the auto updating cycle any time the script encounters an error a message should be written on the record as well as to an error report. This error report will discribe the error as best as possible so that a user can use section two of the script to correct the defined position for the error that was encountered.
ID del proyecto: 3053946

Información sobre el proyecto

5 propuestas
Proyecto remoto
Activo hace 14 años

¿Buscas ganar dinero?

Beneficios de presentar ofertas en Freelancer

Fija tu plazo y presupuesto
Cobra por tu trabajo
Describe tu propuesta
Es gratis registrarse y presentar ofertas en los trabajos
5 freelancers están ofertando un promedio de $343 USD por este trabajo
Avatar del usuario
See private message.
$425 USD en 14 días
4,8 (222 comentarios)
9,6
9,6
Avatar del usuario
See private message.
$382,50 USD en 14 días
5,0 (2 comentarios)
1,9
1,9
Avatar del usuario
See private message.
$313,65 USD en 14 días
2,8 (8 comentarios)
1,3
1,3
Avatar del usuario
See private message.
$340 USD en 14 días
0,0 (0 comentarios)
0,0
0,0
Avatar del usuario
See private message.
$255 USD en 14 días
0,0 (0 comentarios)
0,0
0,0

Sobre este cliente

Bandera de UNITED STATES
Oakland, United States
5,0
100
Forma de pago verificada
Miembro desde sept 21, 2006

Verificación del cliente

¡Gracias! Te hemos enviado un enlace para reclamar tu crédito gratuito.
Algo salió mal al enviar tu correo electrónico. Por favor, intenta de nuevo.
Usuarios registrados Total de empleos publicados
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Cargando visualización previa
Permiso concedido para Geolocalización.
Tu sesión de acceso ha expirado y has sido desconectado. Por favor, inica sesión nuevamente.