A Novel Method for Data Hosting and Load Balancing in Multi Cloud Environment

In recent years
there is a rapid movement of people towards online data hosting services. Many
cloud service providers are offering such services. Data hosting is to store
data on a server or other computer so that it can be accessed over the
internet. Sometimes companies required particular resources for limited period
of time then they need not to purchase those resources. Cloud storage can be understood as
a  service model in which data is
maintained, managed, backed up, remotely and made
available to users over a network (typically the Internet).Companies can use resources over a network on
pay per use basis.

Cloud computing provides
different types of services to the users over the network. It enables companies
to consume resources as a utility just like electricity. Data hosting services
provide users with a efficient and reliable way to store data and this stored
data can be accessed from anywhere, on any device, and at any time. Cloud computing
is internet based computing which provides on demand access to shared pool of
resources and data on pay per use basis. Cloud computing provides distributed
environment which is essential to develop large scale applications rapidly.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

There are
three main cloud-based storage architecture models:

·       
Public

·       
Private

·       
Hybrid. 

Public Cloud storage model provides a multi tenet storage environment that is most
suited for data which is
unstructured. In this architecture data is stored in global data centers and
stored data distributed across multiple regions.

Private Cloud storage model provides a
dedicated environment the data is protected behind an organization’s firewall. Private
clouds are appropriate for users who need more security to the data and more
control over their data.

Hybrid Cloud is a combination of private cloud
and third-party public cloud services. The model offers flexibility and more
data deployment options in cloud. In recent days, more number of customers has
adopted the hybrid cloud model.

            In recent years data hosting
services became more popular so that there are many cloud service providers
offering data hosting services. In most of the cases companies moving towards
hosting their data into a single cloud. However in market there are several
options became available from various cloud vendors

Heterogenous clouds:

             There are various cloud vendors exhibiting
variations in working performances and pricing policies. They design with
different system architectures and apply various techniques to provide better
services. So that customers  are unable
to understand which clouds are suitable to host their data. This is called
vendor lock in risk. It is inefficient for an organization to host all the data
in   a single cloud. It does not provide
guaranteed availability

Multi Cloud data hosting:

             Multi Cloud data hosting is to distribute
across multiple clouds to gain more availability of the data and to minimize
the risk of data loss or system failure due to a centralized component failure
in a cloud computing environment. Such a failure can occur in hardware,
software, or infrastructure. Such a strategy also improves the overall
enterprise performance by avoiding potential risks such as “vendor
lock-in”.        

 SYSTEM STUDY & ANALYSIS

   

 

EXISTING SYSTEM

In existing cloud data hosting systems, availability
of data are usually guaranteed by replication or erasure coding. In the
multi-cloud environment we also use the above two mechanisms to achieve distinct
availability requirements, but both of them require different implementations.  Replication is achieved by using redundancy,
replicas are placed in several clouds, to read data it accesses   the “cheapest”
cloud that charges minimal out-going bandwidth and GET operation unless it is
unavailable.  Data replication is
suitable for systems with distributed applications. For erasure coding, there
are m data blocks and data is encoded into n blocks. m data blocks and n-m
coding blocks are placed into n different clouds. In this case, compared with
replication data availability is guaranteed with lower storage space, to read
data multiple clouds need to be accessed which are storing the corresponding
data blocks. However erasure coding read access is not served by the cheapest
cloud as replication. In the multi-cloud scenario bandwidth is generally (much)
more expensive than storage space. In the multi-cloud scenario the
replication techniques and the erasure coding mechanisms are used to meet
different availability requirements, but the implementation of these are very
different. The two problems related to multi cloud are

·       
How to choose
appropriate clouds in the presence of heterogeneous pricing policies which
provides minimum monetary cost.

·       
How to meet
different cloud availability requirements of different hosting services.

 PROBLEM STATEMENT

 Ø  To
host data in multi-cloud people  encounter the two critical problems:

Ø  How to
choose appropriate clouds in the presence of heterogeneous pricing policies to
minimize monetary cost.

Ø  How to
achieve different availability requirements to provide different services?

Ø  Monetary
cost mainly depends on the usage of data, particularly amount of storage
capacity consumption and amount of network bandwidth consumption.

Ø For availability requirement, consideration
is which redundancy mechanism (i.e., replication or erasure coding) is more
economical based on specific data access patterns. 

Ø How to balance the load  when multiple clouds are active.

Ø How
to identifying the best data centre for hosting based on the given input. The
selection is based on the current resources allocated, size of the data centre
and input file size and load on the centre.

 

PROPOSED
SYSTEM

            We propose a novel method for
cost-efficient data hosting scheme with high availability in heterogeneous
multi-cloud based on a predictor model. It intelligently puts data into
multiple clouds with minimized monetary cost and guaranteed availability.
Specifically, we combine the two widely used redundancy mechanisms, i.e.,
replication and erasure coding, into a uniform model to meet the required
availability in the presence of different data access patterns. Next, we design
an efficient Predictor algorithm to choose proper data storage modes involving
both clouds and redundancy mechanisms (ERREPLCA).

            In existing
system the major focus is combining the replication and erasure methods they
don’t provide a specific method for predictor. However there are many
prediction algorithms exists such as weighted moving average method. Some
methods use building a classifier to predict the access frequency of files. In
the proposed method we build a predictor using data mining algorithms. Since
many of the data centers generate enormous log files the size of input is huge
we need an algorithm to handle such data.

 

Advantages:

1.     
Selects the best cloud for data hosting to balance the
load which is cost effective.

2.     
Uses Replication mechanism for high availability.

3.     
Handles bulk amount of log information and quickly identifies
the best cloud, which is suitable for cloud environment.

4.     
Uses an Efficient predictor to decide the storage mode
and a suitable cloud data center.

5.     
Saves monetary costs.We use a split algorithm for predicting a data centre with fewer
loads for next allocation. Once a Data Centre is allocated later based on
statistics given by predictor we apply ERREPLICA method.Stage1 // PredictorMakeTree(Training Data T) Partition(T) Partition(Data S) if(all points in S are in the same class) then return; Evaluate Splits for
each attribute A; Use best split to
partition S into S1 and S2; Partition(S1);Partition(S2); or each attribute A do traverse attribute list
of A for each value v in the attribute list do
find the corresponding
entry in the class list, and hence the
corresponding class and the leaf node l
update the class
histogram in the leaf l if A is a numeric attribute then compute splitting index
for test (A ? v) for l if A is a categorical attribute then for each leaf of the tree do find subset of A with
best split  // Stage2 Distribute// choosing a Storage Mode and allocating to
a best Data Centre.The Algorithm            Setup (n datacenters)            Alloc(m)          //Allocate m blocks to each dc            Compute Load for each Data Center.            Choose a datacenter based on load            For k=1 to n                        Check the availability
of kth dc suitable for µ             If
µ = sflag                        Allocate
to K            Else                        Ealloc (n, µ)            End //
Algorithm for partitioning and choosing a suitable cloud with least cost.            Ealloc (n,µ)            //The output is minimum cost C, The
set of the selected clouds H.            1.Cßinf;            2.H={}             //initially empty.            3.Sort the clouds by S+ µ //
Accessibility             4.  
for m= 1 to n do            Aßcalculate the availability
of G            If A

Leave a Reply

Your email address will not be published. Required fields are marked *