Objective:
Your task is to create a Java program that extracts/scrapes the data from the websitefor both TV shows and movies. The program should utilize parallelism to improve the efficiency of data fetching. Once the data is fetched, save single value fields to separate RDS databases for TV shows and movies, and store other data to DynamoDB. Additionally, implement a fallback mechanism where if saving to the database fails, the program should write the output to a CSV file. Ensure appropriate logging is implemented to print necessary information during the execution.
Requirements:
1. Data Fetching:
● Extract relevant information such as title, description, release date, genre, duration, cast, director, etc.
2. Data Storage:
● Save single value fields (e.g., title, release date) for TV shows to one RDS database and for movies to another RDS database. You will need to create the tables in the database shared.
● Store all other data (e.g., description, genre, cast) for both TV shows and movies to DynamoDB. The dynamo table will be created with a primary key.
● Implement batch processing for saving data to the databases to improve efficiency.
3. Fallback Mechanism:
● If saving to the database fails for any reason, write the output data to a CSV file and share the same.
● The CSV file should include all the fetched data fields for each TV show and movie.
4. Database Schema:
● Design appropriate database schemas for RDStables to store the single value fields for TV shows and movies separately. Document your schema designs.
5. Implementation:
● Write clean and well-documented Java code using the latest technology.
● Utilize parallelism (e.g., Java threads, ExecutorService) for data fetching to improve performance.
● Utilize libraries/frameworks such as Spring Boot for web scraping, database interaction, and HTTP requests.
● Implement batch processing for database operations using Spring Batch or similar technologies.
● Implement logging statements to print necessary information during execution.
6. Testing (Optional):
● Implement unit tests to ensure the correctness of your code.
● Test your program with various scenarios to handle edge cases gracefully.
7. Documentation:
● Write clear documentation explaining how your program works, including any assumptions made and potential limitations.
● Include setup instructions, usage examples, and any additional
information that may be helpful for someone reviewing your code.
Additional Notes:
● You are encouraged to use the latest Java technologies and libraries to demonstrate your knowledge and skills.
● You are required to upload all the source code with dependencies in your GIT and share the repository to us.
● When extracted, the code needs to be runnable on a new machine with Java 11+.
● Pay attention to efficient batch processing implementation for database operations to optimize performance.
● Ensure that the CSV output is well-formatted and contains all necessary data fields.
● Utilize parallelism effectively to improve the efficiency of data fetching.
● Implement logging statements strategically to provide insights into the program's execution flow.
● Deployment using docker (optional);
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。