Data Collection Methods
Data Collection is performed on the Internet via search engine history.This data is collected by observing past history of popular search topics and history of people’s search history. When someone searches something, it is automatically recorded and accumulated to usable data that can be interpreted easily, just like how Google records their search history and creates a graph expressing information.
Search history is being collected; this is being collected so that the search engine can easily record what is currently popular or what is being conversed on the Internet. The benefits of this are the tracking of current media and the public’s views and interests at a certain time. This data collection could cause harm because some people might want to keep their search history private, and all this information could leak at any time due to a data breach. This data can be accessed by anyone, if you use Google; they allow a user to look up popular searches and give them a graph showing areas where that topic is most searched and the percentage of their interest to the rest of the world.
Another digital data collection from a source other than the Internet could occur at your everyday bank. A bank uses cameras and other surveillance technology to record and collect data every single second of the day to keep the area safe and secure. By using cameras and other like technology they can keep track of who comes in and what they do during their time in he bank.
Data about who comes in a bank and what they do are being collected via cameras. This data is being collected so a bank can record and have proof on what happens just in case something pops up, it is also being recorded for the safety of the bank. If a bank robber were to come, they would be recorded for further investigation. The benefits of this data collection is that people are able to know if someone else was there and withdraw their money, they would have a record of the time of the withdrawal and if that was really them or not. This data might cause harm because if it were ever to leak, valuable and private information would be obtained and could cause further harm. It can also be misleading and cause problems as well. Others would most likely not easily access this data because of the security issues and the private factor with all this sensitive information.
Another form of data collection performed in physical places, is the collection of surveys at a supermarket or fast food chain. Most of the time, restaurants and supermarkets will give their customer a receipt where they can take an optional survey for a little reward like a cookie or a coupon. They would then record the data they received from these surveys.
Data on what people prefer and whether their customer service was satisfactory will be recorded in these surveys. Data also on what the customer would like to see in the future or what can be changed to further meet the customer’s needs. This data is being collected in order for the companies to change their tactics to further satisfy the customer leading to more revenue and more people wanting to come back to their establishment. The benefits of this data collection is that if someone wants to voice their concern or attempt to make a change they can have a voice to improve their experience at their establishment. This data collection could cause harm by allowing random people to voice their input on the matter, so people could just input false data quickly for the reward. The data will not be easily accessible to the public as it is meant for the company and their wellbeing.
Search history is being collected; this is being collected so that the search engine can easily record what is currently popular or what is being conversed on the Internet. The benefits of this are the tracking of current media and the public’s views and interests at a certain time. This data collection could cause harm because some people might want to keep their search history private, and all this information could leak at any time due to a data breach. This data can be accessed by anyone, if you use Google; they allow a user to look up popular searches and give them a graph showing areas where that topic is most searched and the percentage of their interest to the rest of the world.
Another digital data collection from a source other than the Internet could occur at your everyday bank. A bank uses cameras and other surveillance technology to record and collect data every single second of the day to keep the area safe and secure. By using cameras and other like technology they can keep track of who comes in and what they do during their time in he bank.
Data about who comes in a bank and what they do are being collected via cameras. This data is being collected so a bank can record and have proof on what happens just in case something pops up, it is also being recorded for the safety of the bank. If a bank robber were to come, they would be recorded for further investigation. The benefits of this data collection is that people are able to know if someone else was there and withdraw their money, they would have a record of the time of the withdrawal and if that was really them or not. This data might cause harm because if it were ever to leak, valuable and private information would be obtained and could cause further harm. It can also be misleading and cause problems as well. Others would most likely not easily access this data because of the security issues and the private factor with all this sensitive information.
Another form of data collection performed in physical places, is the collection of surveys at a supermarket or fast food chain. Most of the time, restaurants and supermarkets will give their customer a receipt where they can take an optional survey for a little reward like a cookie or a coupon. They would then record the data they received from these surveys.
Data on what people prefer and whether their customer service was satisfactory will be recorded in these surveys. Data also on what the customer would like to see in the future or what can be changed to further meet the customer’s needs. This data is being collected in order for the companies to change their tactics to further satisfy the customer leading to more revenue and more people wanting to come back to their establishment. The benefits of this data collection is that if someone wants to voice their concern or attempt to make a change they can have a voice to improve their experience at their establishment. This data collection could cause harm by allowing random people to voice their input on the matter, so people could just input false data quickly for the reward. The data will not be easily accessible to the public as it is meant for the company and their wellbeing.
Unstructured vs Structured Data
Raw, unstructured data can be overwhelming to use if there is too much, or simply lack any organization that would make the data usable and useful. In order to make unstructured data more usable and useful, we apply structure and organization to it, usually after collection. However, structure and organization applied after collection alters it through selection and organization of pertinent details. Details not captured in the resulting organized set may be lost.
Unstructured data contain everything collected in "raw" form, but connections and relationship among strands of data are both harder to trace and much slower to process than structured data sets. On the other hand, structured data are easy to access and organize, but may lack the big picture and details that unstructured data may possess.
Every time we apply structure to an unstructured data set, it becomes more difficult (sometimes impossible) to gain back some of the unstructured data. Think about it this way:
Common misconception: "Unstructured data" has no structure.
Unstructured data contain everything collected in "raw" form, but connections and relationship among strands of data are both harder to trace and much slower to process than structured data sets. On the other hand, structured data are easy to access and organize, but may lack the big picture and details that unstructured data may possess.
Every time we apply structure to an unstructured data set, it becomes more difficult (sometimes impossible) to gain back some of the unstructured data. Think about it this way:
- By turning trees into logs, we lose some "data" (branches, roots, and leaves).
- By turning logs into lumber, we lose some more "data" (wood chips, bark, and sawdust).
- By turning lumber into a barn, we lose even more "data" (wood chips, and sawdust).
Common misconception: "Unstructured data" has no structure.
- At some level, there is structure to even "unstructured data." For example, the binary representation and the particular encoding used for text, video, etc. are forms of structure. "Structure" in the sense of "structured data" means that some level of organization according to their intended use has been applied to the data before they are stored. With unstructured data, data are stored as collected with any necessary structure applied later during analysis.
Extraction and Internet Data Structure
Google:
http://wowwiki.wikia.com/wiki/World_of_Warcraft:_Wrath_of_the_Lich_King
~5 seconds
DMOZ:
https://en.wikipedia.org/wiki/World_of_Warcraft:_Wrath_of_the_Lich_King
~35 seconds
Google allowed quicker search times and a user-friendlier search engine. All you have to do is search a key term and find the website you want. While DMoz makes the user search through different sub-sections and continue until you narrow it down to a specific subject you are looking for. DMoz utilizes structured data as the information is organized in a way that search is readily easy to find through a database. While Google utilizes unstructured data which is the opposite of structured data, Google uses unstructured data because they don’t have a database where the user sylphs through, they use a search engine which finds that certain key term and brings up small pieces of information that will then appear in their search results. While DMoz takes longer and is more tedious, the result appear to be of better quality because it narrows it down to the exact area of information that you are looking for. While Google is obvious in its advantage in quantity, the quality that it provides is not on par with DMoz’s searches, as they are more vague and broad due to it’s large search database it has to go through.
http://wowwiki.wikia.com/wiki/World_of_Warcraft:_Wrath_of_the_Lich_King
~5 seconds
DMOZ:
https://en.wikipedia.org/wiki/World_of_Warcraft:_Wrath_of_the_Lich_King
~35 seconds
Google allowed quicker search times and a user-friendlier search engine. All you have to do is search a key term and find the website you want. While DMoz makes the user search through different sub-sections and continue until you narrow it down to a specific subject you are looking for. DMoz utilizes structured data as the information is organized in a way that search is readily easy to find through a database. While Google utilizes unstructured data which is the opposite of structured data, Google uses unstructured data because they don’t have a database where the user sylphs through, they use a search engine which finds that certain key term and brings up small pieces of information that will then appear in their search results. While DMoz takes longer and is more tedious, the result appear to be of better quality because it narrows it down to the exact area of information that you are looking for. While Google is obvious in its advantage in quantity, the quality that it provides is not on par with DMoz’s searches, as they are more vague and broad due to it’s large search database it has to go through.