Itm 4273
Autor: goude2017 • November 1, 2017 • 1,511 Words (7 Pages) • 577 Views
...
3.
- Low latency, interactive reports, and OLAP
- Data warehouse is usually known for their low latency and interactive reporting but new data tools presents big data technologies with interesting alternatives.
- ANSI 2003 SQL compliance is required
- Data warehouse has standard features that uses MySQL to improve query respone on data.
- Preprocessing or exploration of raw unstructured data
- Hadoop uses storage grid and have long term storage of data.
- Online archives alternative to tape
- Hadoop prefered online archive because it take longer time to restore data and may lose data in the process.
- High-quality cleansed and consistent data
- Data warehouse have integrated data quality functions build-in. The data that is used for analytics needs to be meaningful for the end users.
6. 100s to 1000s of concurrent users: Many users at one time may slow down a system.
- A data warehouse and Hadoop are capable of allowing many users to access information at once.
- Advanced indexes enable numerous performance gains in data warehouses.
- Optimizer examines incoming SQL and speeds up queries
- Hadoop runs in parallel to analyze large data sets with increased processing power.
7. Discover unknown relationships in the data: Business need BI systems that can discover new trends in huge data sets. This allows them to gain comparative advantage, and helps them to make more informed business decisions.
- Data warehouses can create reports and complex analysis quickly and easily using BI tools
- Data warehouse sites are easy to use for end-users, resulting in better discovery of trends
- Business users can easily demand more reports than IT has staffing to provide.
- DW users can run reports and get results in minutes
- Hadoop is the repository and refinery for raw data
- Hadoop is designed to store large data sets, process the data, and deliver insights to end users.
8. Parallel complex process logic: Several clusters work in parallel to process complex algorithms written by programmers.
- Hadoop is suitable for this because it is ran in clusters that process large amounts of data in parallel
- Mapreduce searches out for specific data and refines it into useful information
9. CPU intense analysis: Analysis that puts strain on a CPU because of the volume of data being processed. The CPU is like an engine, and when you press all the way down on the gas pedal for long period of time, the engine will eventually overheat.
- DWs require CPU intensive analysis because some queries can take over 24 hours depending on how big the DW is.
- When using SQL in a DW, CPU power is necessary for processing large data sets
- Hadoop requires CPU intense analysis because the data being processed is very large.
- Even though data is being analyzed in parallel, the amount of data being processed still puts a strain on the CPU
10. System, users, and data governance: Data governance is when the data going into the DW meets precise standards. This is important because the DW is structured, and the quality of information is important.
- The data is modeled after business concepts,
- ·Ensures that data is structured for business use throughout the Enterprise System
- This is important so that the firm can trust the data that they are extracting from the DW
11. Many Flexible programming languages running in parallel: Ability to run languages simultaneously
- Hadoop is a parallel processing framework
12. Unrestricted, ungoverned sandbox explorations: No limits to data exploration
- Hadoop handles many types of raw data which can be analyzed at the discretion of the analyst.
13. Analysis of provisional data: Ability to analyze data that may be relevant today but not tomorrow
- Hadoop has the advantage of flexibility and time to value
14. Extensive security and regulatory compliance: Ability to hold data securely for proof of regulatory compliance
- Data warehouse is nonvolatile and can keep data for long periods of time
15. Real time data loading and 1 second tactical queries: Ability to take in large data sets quickly and handle short, direct-access queries within a second.
- Data warehouse
- Basic indexing is a standard feature used to improve query response time
- Optimizer that examines incoming SQL
- Hadoop
- Data is stored on local storage instead of SANs
- Doesn’t slow down due to network speed with large movements of data
- Searching through Hadoop is faster and easier than spinning through magnetic disks.
...