Crawl, Edit and Publish

Cancelado Publicado Dec 6, 2007 Pagado a la entrega
Cancelado Pagado a la entrega

This project is about building a web application to collect information, do automatic editing, and re-publish it in a wiki-style. Applicaiton to: 1. COLLECTING. a. Crawl a pre-defined list of web sites (Or use Google to list all pages from the sites) Example: www.do.se. b. Save all pages that meet certain criteria. Criteria might include keywords or a specific "form", like find all single-word titles. Example: pages containing the words "it is" in the title. c. Information to be saved is main headings & text, original URL, and links to images/other media. Since some sites might use improper html structure, there might be a need to add criteria to define more exactly what information to fetch, eg based on css styles or similar. d. When possible each page should be associated with some keywords from meta-keywords and based on for instance the original site structure... Eg. say that the site has breadcrumbs, each part of the breadcrumbs could be saved as keywords. Example: This page has breadcrumbs, and the word "Lagar" should be saved as a keyword in this example: [url removed, login to view] 2. EDITING. a. The title of each page should be edited automatically following certain criteria. Other criteria might give warnings for an editor to check. For instance certain words in the title removed automatically, and titles containing too many words giving an alert to the editor. b. Images should be removed and replaced with a URL to the original site. 3. PUBLISHING. a. Each page recreated into a wiki page. The title of the page comes from the title generated in the previous 2. b. Multiple pages with same titles should automatically be merged into one. c. The site should contain a search engine that searches on titles and keywords (probably not text content). d. The wiki should be standard model (like wikipedia). All URL:s should be natural-language.

## Deliverables

1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.

2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):

a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.

b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.

3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).

## Platform

Open for suggestions.

Administración de bases de datos Ingeniería MySQL PHP Arquitectura de software Verificación de software SQL Traducción Web Hosting Gestión de páginas web Verificación de páginas web

Nº del proyecto: #3538367

Sobre el proyecto

5 propuestas Proyecto remoto Activo Jan 31, 2008

5 freelancers están ofertando un promedio de $842 por este trabajo

ringsl

See private message.

$2125 USD en 14 días
(180 comentarios)
7.8
evisionisfvw

See private message.

$425 USD en 14 días
(27 comentarios)
7.4
languages1985

See private message.

$595 USD en 14 días
(55 comentarios)
5.1
greenvalleyvw

See private message.

$425 USD en 14 días
(12 comentarios)
3.0
sahaja

See private message.

$637.5 USD en 14 días
(3 comentarios)
0.0