Tuesday, 26 June 2012

Books and papers to study for cloud computing

1. nosqldbs-NOSQL Introduction and Overview
2. system and method for data distribution(2009)
3. System and method for large-scale data processing using an application-independent framework(2010)
4. MapReduce: Simplified Data Processing on Large Clusters;
5. MapReduce-- a flexible data processing tool(2010)
6. Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters
7. MapReduce and Parallel DBMSs--Friends or Foes(2010)
8. Presentation:MapReduce and Parallel DBMSs:Together at Last (2010)
9. Twister: A Runtime for Iterative MapReduce(2010)
10. MapReduce Online(2009)
11. Megastore: Providing Scalable, Highly Available Storage for Interactive Services (2011,CIDR)
12. Interpreting the Data:Parallel Analysis with Sawzall
13. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure (technical  report 2010)
14. Large-scale Incremental Processing Using Distributed Transactions and Notifications(2010)
15. Improving MapReduce Performance in Heterogeneous Environments
16. Dremel: Interactive Analysis of WebScale Datasets(2011)
17. Large-scale Incremental Processing Using Distributed Transactions and Notifications
18. Chukwa: a scalable cloud monitoring System (presentation)
19. The Chubby lock service for loosely-coupled distributed systems
20. Paxos Made Simple(2001,Lamport)
21. Fast Paxos(2006)
22. Paxos Made Live - An Engineering Perspective(2007)
23. Classic Paxos vs. Fast Paxos: Caveat Emptor
24. On the Coordinator’s Rule for Fast Paxos(2005)
25. Paxos  made code:Implementing a high throughput Atomic Broadcast (2009)
26. Bigtable: A Distributed Storage System for Structured Data(2006)
27. The Google File System

Google patent papers
1. Data processing system and method for financial debt instruments(1999)
2. Data processing system and method to enforce payment of royalties when copying softcopy books(1996)
3. Data processing systems and methods(2005)
4. Large-scale data processing in a distributed and parallel processing environment(2010)
7. System and method for maintaining replicated data coherency in a data processing system(1995)
8. System and method of using data mining prediction methodology(2006)
9. System and Methodology for Data Processing Combining Stream Processing and spreadsheet computation(2011)
10. Patent Factor index report of system and method of using data mining prediction methodology
11. Pregel: A System for Large-Scale Graph Processing(2010)

1. A simple totally ordered broadcast protocol
2. ZooKeeper: Wait-free coordination for Internet-scale systems
3. Zab: High-performance broadcast for primary-backup systems(2011)
4. wait-free syschronization(1991)
6. Wait-free clock synchronization(ps format)
7. Programming with ZooKeeper - A basic tutorial
8. Hive – A Petabyte Scale Data Warehouse Using Hadoop
9. Thrift: Scalable Cross-Language Services Implementation(Facebook)
10. Hive other files: HiveMetaStore class picture, Chinese docs
11. Scaling out data preprocessing with Hive (2011)
12. HBase The Definitive Guide - 2011
13. Nova: Continuous Pig/Hadoop Workflows(yahoo,2011)
14. Pig Latin: A Not-So-Foreign Language for Data Processing(2008)
15. Analyzing Massive Astrophysical Datasets: Can Pig/Hadoop or a Relational DBMS Help?(2009)
a. Some docs about HStreaming,Zebra
16. HIPI: A Hadoop Image Processing Interface for Image-based MapReduce Tasks
17. System Anomaly Detection in Distributed Systems through MapReduce-Based Log Analysis(2010)
18. Benchmarking Cloud Serving Systems with YCSB(2010)
19. Low-Latency, High-Throughput Access to Static Global Resources within the Hadoop Framework (2009)

SmallFile Combine in hadoop world
1. TidyFS: A Simple and Small Distributed File System(Microsoft)
2. Improving the storage efficiency of small files in cloud storage(chinese,2011)
3. Comparing Hadoop and Fat-Btree Based Access Method for Small File I/O Applications(2010)
4. RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems(Facebook)
5. A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint Files(IBM,2010)

Job schedule
1. Job Scheduling for Multi-User MapReduce Clusters(Facebook)
2. MapReduce Scheduler Using Classifiers for Heterogeneous Workloads(2011)
3. Performance-Driven Task Co-Scheduling for MapReduce Environments
4. Towards a Resource Aware Scheduler in Hadoop(2009)
5. Delay Scheduling: A Simple Technique for Achieving
6. Locality and Fairness in Cluster Scheduling(yahoo,2010)
7. Dynamic Proportional Share Scheduling in Hadoop(HP)
8. Adaptive Task Scheduling for MultiJob MapReduce Environments(2010)
9. A Dynamic MapReduce Scheduler for Heterogeneous Workloads(2009)

1. HStreaming Cloud Documentation
2. S4: Distributed Stream Computing Platform(yahoo,2010)
3. Complex Event Processing(2009)
4. Hstreaming : http://www.hstreaming.com/resources/manuals/
5. StreamBase: http://streambase.com/developers-docs-pdfindex.htm
6. Twitter storm: http://www.infoq.com/cn/news/2011/09/twitter-storm-real-time-hadoop
7. Bulk Synchronous Parallel(BSP) computing
8. MPI

1. Aster Data whilepaper:Deriving Deep Insights from Large Datasets with SQL-MapReduce (2004)
2. SQL/MapReduce: A practical approach to self-describing,polymorphic, and parallelizable user-defined functions(2009,aster)
3. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads(2009)
4. HadoopDB in Action: Building Real World Applications(2010)
5. Aster Data presentation: Making Advanced Analytics on Big Data Fast and Easy(2010)
6. A Scalable, Predictable Join Operator for
7. Highly Concurrent Data Warehouses(2009)
8. Cheetah: A High Performance, Custom Data Warehouse on Top of MapReduce(2010)
9. Greenplum whilepaper:A Unified Engine for RDBMS and MapReduce(2004)
10. A Comparison of Approaches to Large-Scale Data Analysis(2009)
11. MAD Skills: New Analysis Practices for Big Data (2009)
12. C Store A Column oriented DBMS(2005)
13. Distributed Aggregation for Data-Parallel Computing: Interfaces and Implementations(Microsoft)

1. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks (2007)

1. Dynamo: Amazon’s Highly Available Key-value Store(2007)
2. Efficient Reconciliation and Flow Control for Anti-Entropy Protocols
3. The Eucalyptus Open-source Cloud-computing System
4. Eucalyptus: An Open-source Infrastructure for Cloud Computing(presentation)
5. Eucalyptus : A Technical Report on an Elastic Utility Computing Archietcture Linking Your Programs to Useful Systems (2008)
6. Zephyr: Live Migration in Shared Nothing Databases for Elastic Cloud Platforms(2011)
7. Database-Agnostic Transaction Support for Cloud Infrastructures
8. CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems(2011)
9. ELT: Efficient Log-based Troubleshooting System for Cloud Computing Infrastructures

1. Distributed Systems Concepts and Design (5th Edition)
2. Principles of Computer Systems (7-11)
3. Distributed system(chapter)
4. Data-Intensive Text Processing with MapReduce (2010)
5. Hadoop in Action
6. 21 Recipes for Mining Twitter
7. Hadoop.The.Definitive.Guide.2nd.Edition
8. Pro hadoop

Other papers about Distributed system
1. Flexible Update Propagation for Weakly Consistent Replication(1997)
2. Providing High Availability Using Lazy Replication(1992)
3. Managing Update Conflicts in Bayou,a Weakly Connected Replicated Storage System(1995)
4. XMIDDLE: A Data-Sharing Middleware for Mobile Computing(2002)
5. design and implementation of sun network filesystem
6. Chord: A Scalable Peertopeer Lookup Service for Internet Applications(2001)
7. A Survey and Comparison of Peer-to-Peer Overlay Network Schemes(2004)
8. Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing(2001)

1. 21 Recipes for Mining Twitter(Book)
2. Web Data Mining(Book)
3. Web Mining and Social Networking(Book)
4. mining the social web(book)
6. Social Network Analysis and Mining for Business Applications(yahoo,2011)
7. Data Mining in Social Networks(2002)
8. Natural Language Processing with Python(book)
9. data_mining-10_methods(Chinese editation)
10. Mahout in Action(Book)
11. Text Mining Infrastructure in R(2008)
12. Text Mining Handbook(2010)

Web search engine
1. Building Efficient Multi-Threaded Search Nodes(Yahoo,2010)
2. The Anatomy of a Large-Scale Hypertextual Web Search Engine(google)


  1. Learning new technology would give oneself a true confidence in the current emerging Information Technology domain. With the knowledge of big data the most magnificent cloud computing technology one can go the peek of data processing. As there is a drastic improvement in this field everyone are showing much interest in pursuing this technology. Your content tells the same about evolving technology. Thanks for sharing this.

    Hadoop Course in Chennai

  2. Processing data was tough long back without the invention of big data. Under to incredible methodology any data can be processed at maximum speed at minimal time. You are maintaining a wonderful blog, and thanks for sharing this information in here.

    Hadoop Training Chennai

  3. Well Said, you have furnished the right information that will be useful to anyone at all time. Thanks for sharing your Ideas.
    Salesforce training in Chennai | salesforce course in Chennai

  4. Excellent post!!! In this competitive market, customer relationship management plays a significant role in determining a business success. That too, cloud based CRM product offer more flexibility to business owners to main strong relationship with the consumers.
    cloud computing training in chennai|Cloud computing course in Chennai|cloud training in chennai

  5. Im no expert, but I believe you just made an excellent You certainly understand what youre speaking about, and I can truly get behind that.
    Salesforce Training in Chennai
    Salesforce Training

  6. I ever had seen this information over the blog sites; actually I am looking forward for this information. Here I had an opportunity to read, it was crystal clear keep sharing…I have an expectation about your upcoming post.
    Salesforce Administrator 201 Training in Chennai|Salesforce Administrator 211 Training in Chennai|Salesforce Training in Chennai

  7. for preparing bank exam and group exam , we are providing an online test model questions papers

    Group Exam Questions and Answers
    Bank Exam Questions and Answers

  8. I just want to say that all the information you have given here is awesome...great and nice blog thanks sharing..Thank you very much for this one. And i hope this will be useful for many people.. and i am waiting for your next post keep on updating these kinds of knowledgeable things...
    Web Design Development Company
    Web design Company in Chennai
    Web development Company in Chennai

  9. Thank you a lot for providing individuals with a very spectacular possibility to read critical reviews from this site.

    Best Java Training Institute Chennai

    Amazon Web Services Training in Chennai