Storage layer:
The storage layer is usually loaded with data using a batch process.
The integration component of the ingestion layer invokes various mechanisms—like Sqoop, MapReduce jobs, ETL jobs, and others—to upload data to the distributed Hadoop storage layer (DHSL).
Hadoop:
Components:
1. HDFS
2. Map reduce
Storage pattern:
Legacy Data Sources:
HTTP/HTTPS web services |
RDBMS |
FTP |
JMS/MQ based services |
Text/flat file/csv logs |
XML data sources |
IM Protocol requests |
New Age Data Sources:
High Volume Sources |
1. Switching devices data |
2. Access point data messages |
3. Call data record due to exponential growth in user base |
4. Feeds from social networking sites |
Variety of Sources |
1. Image and video feeds from social Networking sites |
2. Transaction data |
3. GPS data |
4. Call center voice feeds |
5. E-mail |
6. SMS |
High Velocity Sources |
1. Call data records |
2. Social networking site conversations |
3. GPS data |
4. Call center – voice-to-text feeds |
Debug steps:
Error Message From BI Security Service: [nQSError: 46164] HTTP Server returned 404 Authentication failed: invalid user/password.
Resolution:
Analyzing structured data
Analyzing unstructured data
Introduction
Traditional BI vs Big Data
Data is retained in a distributed file system instead of on a central server.
The processing functions are taken to the data rather than data being taking to the functions.
Data is of different formats, both structured as well as unstructured.
Data is both real-time data as well as offline data.
Technology relies on massively parallel processing (MPP) concepts.
Use
Energy companies monitor and combine usage data recorded from smart meters in real time to provide better service to their consumers and improved uptime.
Web sites and television channels are able to customize their advertisement strategies based on viewer household demographics and program viewing patterns.
Fraud-detection systems are analyzing behaviors and correlating activities across multiple data sets from social media analysis.
High-tech companies are using big data infrastructure to analyze application logs to improve troubleshooting, decrease security violations, and perform predictive application maintenance.
Social media content analysis is being used to assess customer sentiment and improve products, services, and customer interaction.
JMeter
Its a tool used for performance testing but as its freely available we can use it for normal testing.
Jenkins
Its a tool which can be used to automate tasks.
Integrate Jenkins and JMeter
JMeter comes with many *unix command, We should be hitting a shell scripts from Jenkins which in turn should invoke JMeter, get and display results on Jenkins console.
Looks simple?
Idea
You can create one JMX per web service or for multiple web services depending upon requirement, now the main problem which we usually face in automation tasks is parameterisation, we want to keep some parameters save for the run time, parameters which we can not hard code, this actually makes the automation the automation. Now first identify those parameters which you want to keep for the run time.
Now when you hit button from Jenkins, it should prepare those parameters in the form its required by the Jenkins, you can write a *nix script for that for example if required.
Now once the parameters are ready you invoke JMeter from your script, now there is a turn here, JMeter JMX file can accept input at run time from two ways:
Which ever way you choose, make sure you have prepared your JMX file accordingly in advance before doing this.
For example, if you want to use option-1, you can create CSVs at the run time from scripts but in JMX make sure you provide correct path and name.
And in case you want to go with option-2 there is a different syntax you will have to provide in JMX in order to read variables passed at run time.
Now point is you can customize it the way you want, you can use these ways to provide parameters at run because thats what i think matters one of the most in any automation. GUI we have got from Jenkins, scripts and prepration and invoke we can do from shell, testing will be done by JMeter, and parameters well written much about it.
Output
Well, written much about how to test a thing, here is, how to get the result, use JMeter result tree option to create as many result CSV files as you want with different name and read these files through your script and display it on Jenkins. You can even keep name and location of files dynamic using two options explained above.
Performance testing
Well jenkins even support performance output, great, isnt it? you need to install and configure plugin for that. Find more details here.
https://wiki.jenkins-ci.org/display/JENKINS/Performance+Plugin