Crawl, Edit and Publish
$500-5000 USD
Pagado a la entrega
This project is about building a web application to collect information, do automatic editing, and re-publish it in a wiki-style. Applicaiton to: 1. COLLECTING. a. Crawl a pre-defined list of web sites (Or use Google to list all pages from the sites) Example: www.do.se. b. Save all pages that meet certain criteria. Criteria might include keywords or a specific "form", like find all single-word titles. Example: pages containing the words "it is" in the title. c. Information to be saved is main headings & text, original URL, and links to images/other media. Since some sites might use improper html structure, there might be a need to add criteria to define more exactly what information to fetch, eg based on css styles or similar. d. When possible each page should be associated with some keywords from meta-keywords and based on for instance the original site structure... Eg. say that the site has breadcrumbs, each part of the breadcrumbs could be saved as keywords. Example: This page has breadcrumbs, and the word "Lagar" should be saved as a keyword in this example: [url removed, login to view] 2. EDITING. a. The title of each page should be edited automatically following certain criteria. Other criteria might give warnings for an editor to check. For instance certain words in the title removed automatically, and titles containing too many words giving an alert to the editor. b. Images should be removed and replaced with a URL to the original site. 3. PUBLISHING. a. Each page recreated into a wiki page. The title of the page comes from the title generated in the previous 2. b. Multiple pages with same titles should automatically be merged into one. c. The site should contain a search engine that searches on titles and keywords (probably not text content). d. The wiki should be standard model (like wikipedia). All URL:s should be natural-language.
## Deliverables
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):
a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
## Platform
Open for suggestions.
Nº del proyecto: #3538367