De Anza logo Course Outlines

Public Search

 
 
Close Window/Tab
PRINT VIEW -- Opens in new, second window. Use browser controls to close when finished.
Credit- Degree applicable
Effective Quarter: Fall 2020

I. Catalog Information

CIS 64F
Introduction to Big Data and Analytics
4 Unit(s)

 

Requisites: Advisory: EWRT 211 and READ 211, or ESL 272 and 273.

Hours: Lec Hrs: 48.00
Out of Class Hrs: 96.00
Total Student Learning Hrs: 144.00

Description: Introduction to Big-Data deluge, management of unstructured and structured data and design of large scale database systems. Concepts covered include Map-reduce parallel processing algorithms, Real-time analytics, classification, and predictive analytics, attributes of Big-Data and related issues. Introduction to large scale file systems and operations and parallel processing algorithms.


Student Learning Outcome Statements (SLO)

 

• Student Learning Outcome: Design, implement and debug a large scale database system using technology like Hadoop or Cassandra.


 

• Student Learning Outcome: Perform data analysis using a large-scale database systems given a set of user requirements.


II. Course Objectives

A.Explore big-data technologies as means to solving key business analytical problems.
B.Interpret and analyze techniques for setting up patterns for data analysis.
C.Compare and contrast the data and relation algorithms.
D.Examine data pre-processing and visualization techniques for enabling data analytic scenarios.
E.Articulate the characteristics of regression, forecasting and classification techniques for predictive analytics.
F.Interpret and analyze architecture of database clustering technologies.

III. Essential Student Materials

 None

IV. Essential College Facilities

 None

V. Expanded Description: Content and Form

A.Explore big-data technologies as means to solving key business analytical problems.
1.Data analytics, Data mining and knowledge discovery.
2.Competitor, intelligence and big data.
3.Business case studies: Electronic Health Records (EHR), US Dept of Transportation.
B.Interpret and analyze techniques for setting up patterns for data analysis.
1.RDBMS Relational Modeling
2.No-SQL DB Modeling
3.Datawarehousing modeling, data mining and online analytical processing.
C.Compare and contrast the data and relation algorithms.
1.Auto-Associator
2.Component Analysis
3.Diagrams
4.Multidimensional Scaling
5.Histograms
D.Examine data pre-processing and visualization techniques for enabling data analytic scenarios.
1.Error Type and Error Handling
2.Filtering
3.Data Transformation
4.Data Merging
5.Linear Correlation, correlation and causality.
6.Chi-square test for independence
E.Articulate the characteristics of regression, forecasting and classification techniques for predictive analytics.
1.Linear regression, linear regression with nonlinear substitution and robust regression.
2.Cross validation and feature selection.
3.Finite state machines, recurrent models and autoregressive models.
4.Classification criteria, naive bayes classifier and linear discriminant analysis.
5.Support vector machines, nearest neighbor classifier and learning vector quantization.
6.Decision Trees
F.Interpret and analyze architecture of database clustering technologies.
1.Hadoop
2.Oracle RAC
3.MySQL Clusters
4.Windows Clustering
5.Cassandra
6.Trackvia, nCluster from Teradata.

VI. Assignments

A.Readings from Text.
B.Documenting, coding, testing and debugging six to ten programs with guidance provided with clearly documented design, half completed in the computer lab, half completed as homework.

VII. Methods of Instruction

 Lecture and visual aids
Discussion of assigned reading
Discussion and problem solving performed in class
Collaborative learning and small group exercises
Collaborative projects
Collaborative learning and small group exercises
Homework and extended projects

VIII. Methods of Evaluating Objectives

A.One or two midterm examinations requiring some programming, concepts clarification and exhibiting mastery of large scale database systems principles.
B.A final examination requiring concepts clarification and exhibiting mastery of large scale database system principles.
C.Evaluation of programming assignments, based on correctness, documentation, code quality, and test plan executions.

IX. Texts and Supporting References

A.Examples of Primary Texts and References
1.Jay Liebowitz, DSc, "Big Data and Business Analytics", CRC Press, 2013 Examples of Primary Texts and References
2.Thomas A. Runkler, "Data Analytics: Model and Algorithms for Intelligent Data Analysis", Vieweg+Teubner Verlag; 2nd edition; 2016
B.Examples of Supporting Texts and References
1.None