Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Escuela: Ciencias Bsicas Tecnologa e Ingeniera - Programa: Ingeniera de Sistemas Curso: Base de Datos Avanzada
Momento Intermedio Fase 3 - Unidad 3 - Introduccin, Modelado y administracin de Sistemas Big Data
( https://github.com/Unad-BDAvanzadas/U3_Caso_Material_Estudio)
If you had your choice Big Data analysis areas to work in, which would you choose? If you can describe a specific type of data or
problem you would like to work on.
Health/Medical
City/Government/Infrastructure
Personalized Marketing
Product Growth
Something else?
It's commonly discussed in the news how social media sites like twitter and facebook gather data on their users. But take a minute to
this in detail about the various ways you interact with machines and applications on a given day.
What's one surprising or uncomfortable thing you may be providing data on?
Is there a non-social media (or shopping) application you realize you do give information to (perhaps that you hadn't thought of
before)?
What questions can you formulate that could be answered with this data that would bring value to the company?
In the value lectures, we mentioned a social media game called Catch the Flamingo. Big data is generated when we track all of the
user's data and store them in our database.
Why do you think it would be beneficial to collect player data and perform analysis on it for the future of the game? In particular,
which aspect of the game could you imagine we could improve using this data?
Think about a big data problem of interest to you (e.g., in your career, in your life, etc.). When you think about that problem, which of
the 5 Ps is most interesting or most challenging to address?
Note: If you don't have a big data problem of your own, think about Ilkay's work in fire analytics.
The 5 Ps are: People, Process, Purpose, Platforms, Programmability
Hands On (Talleres)
MapReduce is the core programming model for the Hadoop Ecosystem. Weve found its really helpful to walk through the steps of
MapReduce for yourself in order to internalize how it really works. In Lectura lecture, we walked through the steps of MapReduce to
count words -- our keys were words. In this exercise, well have you count shapes -- the keys will be shapes.
Note: This assignment can be done in PPT and printed to PDF or on paper and submitted as a picture. Template in PPT, template in
JPG.
Universidad Nacional Abierta y a Distancia UNAD Vicerrectora Acadmica y de Investigacin - VIACI
Escuela: Ciencias Bsicas Tecnologa e Ingeniera - Programa: Ingeniera de Sistemas Curso: Base de Datos Avanzada
Cuestionario
Wi-Fi Networks
Social Media
The Internet
Individual, Unconnected Hospital Databases
2. What reasoning was given for the following: why is the data storage to price ratio relevant to big data?
Larger storage means easier accessibility to big data for every user because it allows users to download in bulk.
Access of larger storage becomes easier for everyone, which means client-facing services require very large data
storage.
It isn't, it was just an arbitrary example on big data usage.
Companies can't afford to own, maintain, and spend the energy to support large data storage unless the cost is
sufficiently low.
3. What is the best description of personalized marketing enabled by big data?
Being able to use the data from each customer for marketing needs.
Being able to obtain and use customer information for specific groups and utilize them for marketing needs.
Marketing to each customer on an individual level and suiting to their needs.
4. Of the following, which are some examples of personalized marketing related to big data?
Mobile advertising benefits from data integration with location which requires big data.
Since almost everyone owns a cell/mobile phone, the mobile advertising market is large and thus requires big data to
contain all the information.
Mobile advertising allows massive cellular/mobile texting to a wide audience, thus providing large amounts of data.
Mobile advertising in and of itself is always associated with big data.
7. What are the three types of diverse data sources?
Social Media
Weather station sensor output.
Sorted data from Amazon regarding customer info.
9. What is an example of organizational data?
10. Of the three data sources, which is the hardest to implement and streamline into a model?
Organizational Data
Machine Data
People
11. Which of the following summarizes the process of using data streams?
In the situation
The sensors used in airplanes to measure altitude.
Accelerometers.
Bringing the computation to the location of the data.
Universidad Nacional Abierta y a Distancia UNAD Vicerrectora Acadmica y de Investigacin - VIACI
Escuela: Ciencias Bsicas Tecnologa e Ingeniera - Programa: Ingeniera de Sistemas Curso: Base de Datos Avanzada
15. Which of the following are reasons mentioned for why data generated by people are hard to process?
Customer Satisfaction
Better Profit Margins
Improved Safety
High Velocity
Higher Sales
18. What are data silos and why are they bad?
Highly unstructured data. Bad because it does not provide meaningful results for organizations.
A giant centralized database to house all the data produces within an organization. Bad because it is hard to maintain
as highly structured data.
Data produced from an organization that is spread out. Bad because it creates unsynchronized and invisible data.
A giant centralized database to house all the data production within an organization. Bad because it hinders
opportunity for data generation.
Universidad Nacional Abierta y a Distancia UNAD Vicerrectora Acadmica y de Investigacin - VIACI
Escuela: Ciencias Bsicas Tecnologa e Ingeniera - Programa: Ingeniera de Sistemas Curso: Base de Datos Avanzada
Wi-Fi Networks
Social Media
The Internet
While the Internet may be enabling the easier collection and sharing of big data, in and of itself, it is not an example of
big data utilized in action today.
Larger storage means easier accessibility to big data for every user because it allows users to download in bulk.
Access of larger storage becomes easier for everyone, which means client-facing services require very large data
storage.
It isn't, it was just an arbitrary example on big data usage.
Companies can't afford to own, maintain, and spend the energy to support large data storage unless the cost is
sufficiently low.
3. What is the best description of personalized marketing enabled by big data?
Being able to use the data from each customer for marketing needs.
Being able to obtain and use customer information for specific groups and utilize them for marketing needs.
Marketing to each customer on an individual level and suiting to their needs.
4. Of the following, which are some examples of personalized marketing related to big data?
Mobile advertising benefits from data integration with location which requires big data.
Since almost everyone owns a cell/mobile phone, the mobile advertising market is large and thus requires big data to
contain all the information.
Mobile advertising allows massive cellular/mobile texting to a wide audience, thus providing large amounts of data.
Mobile advertising in and of itself is always associated with big data.
7. What are the three types of diverse data sources?
Social Media
Weather station sensor output.
Sorted data from Amazon regarding customer info.
9. What is an example of organizational data?
10. Of the three data sources, which is the hardest to implement and streamline into a model?
Organizational Data
Machine Data
People
11. Which of the following summarizes the process of using data streams?
In the situation
The sensors used in airplanes to measure altitude.
Accelerometers.
Bringing the computation to the location of the data.
Universidad Nacional Abierta y a Distancia UNAD Vicerrectora Acadmica y de Investigacin - VIACI
Escuela: Ciencias Bsicas Tecnologa e Ingeniera - Programa: Ingeniera de Sistemas Curso: Base de Datos Avanzada
15. Which of the following are reasons mentioned for why data generated by people are hard to process?
Customer Satisfaction
Better Profit Margins
Improved Safety
High Velocity
Higher Sales
18. What are data silos and why are they bad?
Highly unstructured data. Bad because it does not provide meaningful results for organizations.
A giant centralized database to house all the data produces within an organization. Bad because it is hard to maintain
as highly structured data.
Data produced from an organization that is spread out. Bad because it creates unsynchronized and invisible data.
A giant centralized database to house all the data production within an organization. Bad because it hinders
opportunity for data generation.
Universidad Nacional Abierta y a Distancia UNAD Vicerrectora Acadmica y de Investigacin - VIACI
Escuela: Ciencias Bsicas Tecnologa e Ingeniera - Programa: Ingeniera de Sistemas Curso: Base de Datos Avanzada
1. Amazon has been collecting review data for a particular product. They have realized that almost 90% of the reviews
were mostly a 5/5 rating. However, of the 90%, they realized that 50% of them were customers who did not have proof
of purchase or customers who did not post serious reviews about the product. Of the following, which is true about the
review data collected in this situation?
High Veracity
Low Valence
High Valence
High Volume
Low Veracity
Low Volume
2. As mentioned in the slides, what are the challenges to data with high valence?
Veracity
Value
Volume
Valence
Vision
Velocity
Variety
Universidad Nacional Abierta y a Distancia UNAD Vicerrectora Acadmica y de Investigacin - VIACI
Escuela: Ciencias Bsicas Tecnologa e Ingeniera - Programa: Ingeniera de Sistemas Curso: Base de Datos Avanzada
1. Which of the follow are parts of the 5 P's of data science and part of an additional P introduced in the slides?
Purpose
Process
People
Platforms
Perception
Programmability
Product
2. Which of the following are part of the four main categories to acquire, access, and retrieve data?
Remote Data
Web Services
Text Files
Traditional Databases
NoSQL Storage
3. What are the steps required for data analyzation?
4. Of the following, what is a technique mentioned in the videos for building a model?
Analysis
Validation
Evaluation
Investigation
5. What is the first step in finding a right problem to tackle in data science?
Collect Data
Build In-House Expertise
Business Objectives
Organizational Buy-In
7. According to Ilkay, why is exploring data crucial to better modeling?
1. Which of the following is the best description of why it is important to learn about the foundations in big data?
High Concurrency
Large Storage
Data Scalability
High Fault Tolerance
Universidad Nacional Abierta y a Distancia UNAD Vicerrectora Acadmica y de Investigacin - VIACI
Escuela: Ciencias Bsicas Tecnologa e Ingeniera - Programa: Ingeniera de Sistemas Curso: Base de Datos Avanzada
5. Which of the following are general requirements for a programming language in order to support big data models?
Intro to Hadoop
Hardware Only
Computing Environment
Software On-Demand
2. What does PaaS provide?
Software On-Demand
Computing Environment
Hardware Only
3. What does SaaS provide?
Hardware Only
Software On-Demand
Computing Environment
4. What are the two key components of HDFS and what are they used for?
Low level deals with interactivity while high level deals with storage and scheduling.
Low level deals with storage and scheduling while high level deals with interactivity.
10. Which of the following are problem sto look out for when you want to integrate your project with Hadoop?
11. As covered in the slides, which of the following are the major goals of Hadoop?
Los textos de cada una de las Lecturas que se deben hacer, para la preparacin de los Mapas Conceptuales, se
encuentran en el siguiente vnculo.
(https://github.com/Unad-BDAvanzadas/U3_Caso_Material_Estudio)
Recomendaciones por el docente: El trabajo final de grupo, para cada una de las fases establecidas, debe obtenerse a
partir de la discusin, revisin, complementacin y consolidacin de los productos y aportes presentados individualmente.
Debe darse una dinmica de interaccin permanente y de aportes significativos al interior del grupo, de acuerdo al rol
asumido por cada integrante tanto en el desarrollo del trabajo colaborativo como en la produccin de los entregables
(producto final del grupo).
Se debe entregar un slo archivo con el desarrollo del trabajo. La idea es que presenten un documento con la consolidacin de
los consensos o acuerdos hechos a partir de las propuestas individuales, que es diferente a la unin (copie y pegue) de todo lo
enviado y tambin diferente a la presentacin de slo uno de los aportes individuales enviados.
Uso de la norma APA, versin 3 en espaol (Traduccin de la versin 6 en ingls)
Las Normas APA es el estilo de organizacin y presentacin de informacin ms usado en el rea de las ciencias sociales.
Estas se encuentran publicadas bajo un Manual que permite tener al alcance las formas en que se debe presentar un artculo
cientfico. Aqu podrs encontrar los aspectos ms relevantes de la sexta edicin del Manual de las Normas APA, como
referencias, citas, elaboracin y presentacin de tablas y figuras, encabezados y seriacin, entre otros. Puede consultar como
implementarlas ingresando a la pgina http://normasapa.com/
Universidad Nacional Abierta y a Distancia UNAD Vicerrectora Acadmica y de Investigacin - VIACI
Escuela: Ciencias Bsicas Tecnologa e Ingeniera - Programa: Ingeniera de Sistemas Curso: Base de Datos Avanzada
Polticas de plagio: Qu es el plagio para la UNAD? El plagio est definido por el diccionario de la Real Academia como
la accin de "copiar en lo sustancial obras ajenas, dndolas como propias". Por tanto el plagio es una falta grave: es el
equivalente en el mbito acadmico, al robo. Un estudiante que plagia no se toma su educacin en serio, y no respeta el
trabajo intelectual ajeno.
No existe plagio pequeo. Si un estudiante hace uso de cualquier porcin del trabajo de otra persona, y no documenta su
fuente, est cometiendo un acto de plagio. Ahora, es evidente que todos contamos con las ideas de otros a la hora de
presentar las nuestras, y que nuestro conocimiento se basa en el conocimiento de los dems. Pero cuando nos apoyamos en
el trabajo de otros, la honestidad acadmica requiere que anunciemos explcitamente el hecho que estamos usando una
fuente externa, ya sea por medio de una cita o por medio de un parfrasis anotado (estos trminos sern definidos ms
adelante). Cuando hacemos una cita o un parfrasis, identificamos claramente nuestra fuente, no slo para dar
reconocimiento a su autor, sino para que el lector pueda referirse al original si as lo desea.
Existen circunstancias acadmicas en las cuales, excepcionalmente, no es aceptable citar o parafrasear el trabajo de otros. Por
ejemplo, si un docente asigna a sus estudiantes una tarea en la cual se pide claramente que los estudiantes respondan
utilizando sus ideas y palabras exclusivamente, en ese caso el estudiante no deber apelar a fuentes externas an, si stas
estuvieran referenciadas adecuadamente.