Sources of big structured data

  • Machine-generated: Machine-generated data generally refers to data that is created by a machine without human intervention
  • Human-generated: This is data that humans, in interaction with computers, supply




Sensors such as RFID tags, smart meters, GPS in smart phones and tablets, 

Structured Data

Data that has a defined length and format

Example: a customer’s name, date of birth, address, and so on. Most experts agree that this kind of data accounts for about 20 percent of the data that is out there. Structured data is usually stored in a database can be queried using a language like structured query language (SQL).

Advantages of NoSQL database over SQL

No schema required:
Data can be inserted in a NoSQL database without first defining a rigid database schema. As a corollary, the format of the data being inserted can be changed at any time, without application disruption. This provides immense application flexibility, which ultimately delivers substantial business flexibility.
Auto elasticity:
NoSQL automatically spreads your data onto multiple servers without requiring application assistance. Servers can be added or removed from the data layer without application downtime.
Integrated caching:
In order to increase data through and increase the performance advance NoSQL techniques cache data in system memory. This is in contrast to SQL database where this has to be done using separate infrastructure.
Four types of popular NoSQL databases.
Key-value stores. As the name implies, a key-value store is a system that stores values indexed for retrieval by keys. These systems can hold structured or unstructured data.
Column- oriented databases. Rather than store sets of information in a heavily structured table of columns and rows with uniform sized fields for each record, as is the case with relational databases, column-oriented databases contain one extendable column of closely related data.
document-based stores. These databases store and organize data as collections of documents, rather than as structured tables with uniform sized fields for each record. With these databases, users can add any number of fields of any length to a document.

.Graph databases. graph database is a database that uses graph structures with nodes, edges, and properties to represent and store data.

Problems that Hadoop can solve

Hadoop-­able Problems

1. Modeling true risk
2. Customer churn analysis
3. Recommendation engine
4. PoS transaction analysis
5. Analyzing network data to predict failure
6. Threat analysis
7. Search quality
8. Data “sandbox”

Four categories of NoSQL databases with examples (not just Big Data)

1. Key-values Stores
The main idea here is using a hash table where there is a unique key and a pointer to a particular item of data. The Key/value model is the simplest and easiest to implement. But it is inefficient when you are only interested in querying or updating part of a value, among other disadvantages.
Examples: Tokyo Cabinet/Tyrant, Redis, Voldemort, Oracle BDB, Amazon SimpleDB, Riak
2. Column Family Stores
These were created to store and process very large amounts of data distributed over many machines. There are still keys but they point to multiple columns. The columns are arranged by column family.
Examples: Cassandra, HBase
3. Document Databases
These were inspired by Lotus Notes and are similar to key-value stores. The model is basically versioned documents that are collections of other key-value collections. The semi-structured documents are stored in formats like JSON. Document databases are essentially the next level of Key/value, allowing nested values associated with each key.  Document databases support querying more efficiently.
Examples: CouchDB, MongoDb
4. Graph Databases
Instead of tables of rows and columns and the rigid structure of SQL, a flexible graph model is used which, again, can scale across multiple machines. NoSQL databases do not provide a high-level declarative query language like SQL to avoid overtime in processing. Rather, querying these databases is data-model specific. Many of the NoSQL platforms allow for RESTful interfaces to the data, while other offer query APIs.

Examples: Neo4J, InfoGrid, Infinite Graph