Data Analytics Project: Use MapReduce with Java for data profiling and cleaning
$10-30 USD
En curso
Publicado hace alrededor de 2 años
$10-30 USD
Pagado a la entrega
Cleaning and Profiling the tweets by removing hashtags, emoticons, or any redundant data which is not useful for analysis. Organize the user_location column in a common standard format. Dataset has been attached. Or you can get it from the link below:
[login to view URL]
Tasks:
Data profiling: Write MapReduce java code to characterize (profile) the data in each column.
Data cleaning: Cleaning and Profiling the tweets by removing hashtags, emoticons, or any redundant data which is not useful for analysis. Write MapReduce java code to ETL (extract, transform, load) data source. Drop some unimportant columns, Normalize data in a column, and Detect badly formatted rows.