Sei sulla pagina 1di 3

Thoughts-Linking SEIS 630, 736, 631, 632 & 732

736:
Hadoop = distributed file system (saves work computed using Apache Spark)
Apache Spark = in memory processing framework which uses scala language
630:
SQL = allows you to query structured data ( relational database like Oracle, MySQL,
SQL Server)
736:
Hive = allows you to query non structured data (non relational database like MongoDB,
Cassandra, Hadoop dont use SQL so they are non relational databases are also known
as No SQL databases)
630:
Structure data isn’t big data.
736:
unstructured data is big data.
630:
Structure data is data that fit into rows & columns & is stored in a relational database
like Oracle, MySQL, SQL Server)
736:
Unstructured data is data that doesn’t fit into rows & columns and is stored in a non
relational database like MongoDB, Cassandra, Hadoop)
Example of unstructured data include emails, audio,image, text, video & social media.

New SQL databases- scalability of non relational database(better than relational


database because we trade off the acid property focusing on data reliability and instead
use the base property focusing on data availability) while still providing the structure
(better than non relational database because a certain level of structure is needed for
making accurate decision to full-fill business requirements)

630 & 736:


PH level measures the relative basicity (non relational databases) and acidity (relational
databases) of database transaction.

630 & 732:


Business Intelligence and Data Warehousing Data Models are Key to Database
Design

632:
Data modeling/ data structure involves visualizing data through use of graphical tools,
so you will want to obtain a data modeling software package or use graphical
capabilities in existing software.

A data model explicitly determines the structure of data in the context of daya science;
it is also known as data structure in the context of computer science.

630 & 732:


Thoughts-Linking SEIS 630, 736, 631, 632 & 732
Business intelligence is where you organize data (add structure to unstructured data)
in order to perform analytic operations on data such as: Query by multiple criteria, Slice
and dice", Drill Down & Roll Up.

732:
Data warehousing are a set of data models/ data structures that aggregates structured
data and store all the information so that it can be used in data analysis & reporting
which for developing business intelligence. 

Data Analyst applies basic statistical algorithms to analyze structured data (stored in


relational database as table rows) in order to improve decision making/ business
intelligence.

Data Scientists applies advanced statistical algorithms to analyze unstructured


data (stored in non relational database) in order to improve decision making/ business
intelligence.

Software skills for data scientists


Data Science languages — Python/R
Relational databases — MySQL, Postgress
Non-relational databases — MongoDB
Distributed computing — Hadoop, Spark
Cloud — GCP/AWS/Azure
Machine learning models — e.g. Regression, Boosted Trees SVM, NNs
Graph — Neo4J, GraphX
API Interaction — OAuth, Rest
Data Visualisation and Webapps — D3, RShiny

Programming language is the syntax and style.

Platform/ framework is the execution environment for running the programming


language.

Platform/ framework is a set of libraries containing built-in functions/methods, data


structures, classes used for developing desktop or web applications.

API is the interface of the framework.

Desktop application/stand alone application/Executible is a computer program that


runs locally(without internet connection) on the computers desktop. Applications that are
installed on desktop computers are called standalone application. Examples include
google chrome, Notepad ++

Web application is a Client-server computer program in which the client side/GUI runs


on a web browser and it communicates with the server computer via internet
connection. In computing, a server is a computer designed to process requests and
Thoughts-Linking SEIS 630, 736, 631, 632 & 732
deliver data to another computer over the internet or a local network. A Web
servers show pages and run apps through web browsers.

A cluster is refers to an implementation of shared computation. A server cluster shares


the computation of a single computer amongst multiple servers so its a 1:M client to
servers relationship.

In distributed computing, each processor has its own private memory/ distributed
memory. Information is exchanged by passing messages between the
processors. Server clustering is used for high performance distributed computing.

In parallel computing, all processors have access to a shared memory to exchange


information between processors.

https://medium.com/@martinomburajr/java-create-your-own-hello-world-server-
2ca33b6957e

https://medium.freecodecamp.org/lessons-learned-from-deploying-my-first-full-stack-
web-application-34f94ec0a286

https://www.infoq.com/articles/raw-data-to-data-science?
utm_source=infoq&utm_campaign=user_page&utm_medium=link

https://www.lifewire.com/servers-in-computer-networking-817380

https://coderanch.com/t/636684/java/Difference-Web-application-desktop-application

Potrebbero piacerti anche