Friday, September 4, 2009

Serialization in Java

We all know the Java platform allows us to create reusable objects in memory. However, all of those objects exist only as long as the Java virtual machine1 remains running. It would be nice if the objects we create could exist beyond the lifetime of the virtual machine, wouldn't it? Well, with object serialization, you can flatten your objects and reuse them in powerful ways.

Serialization: Object serialization is the process of saving an object's state to a sequence of
bytes, as well as the process of rebuilding those bytes into a live object at some future time.

Following are the set of rules to be implemented when we make an object as Serializable:
Rule #1: The object to be persisted must implement the Serializable interface or inherit that implementation from its object hierarchy

Rule #2: The object to be persisted must mark all nonserializable fields (fields that don't get saved) as transient

Rule #3: The class that is being serialized should declare a serialVersionUID; as private static final long serialVersionUID. This is used to identify the differences between a class's object between a write and read and hence take appropriate actions. If you dont create it, it will be provided by compiler but it has its limitations.

Rule #4: By default, the serialized object would be written to the default stream and that would be taken care by compiler and JVM. However, if you wish to save to some other place like a file or a database, you need to create an OutputStream for it and call the ObjectOutputStream's writeObject method (this is the method that marshalls the actual object fields into sequence of bytes), passing the output stream created. Similarly, at the time of unmarshalling the bytes back to the actual object, we should call the readObject method and cast the received object to its actual type.

Rule #5: If we want to perform some logic (like encrypting a password before saving it and decrypting it upon restoring it), we can override the writeObject and readObject methods with their exact signatures and perform the logic inside those methods and finally call the defaultWriteObject and defaultReadObject from those 2 methods.

Let us consider the above rules with an example:

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;

class Employee implements Serializable {

private static final long serialVersionUID = 1L;

private String name;
private int empNo;
private String designation;
private transient String department;

public Employee() {

}

public Employee(String name,int empNo,String designation,String department) {
this.name = name;
this.empNo = empNo;
this.designation = designation;
this.department = department;
}
/*private void writeObject(ObjectOutputStream oos) throws IOException {
if(designation.equalsIgnoreCase("SE"))
this.designation = "Software Engineer";

else if(designation.equalsIgnoreCase("SSE"))
this.designation = "Senior Software Engineer";

oos.defaultWriteObject();
}

private void readObject(ObjectInputStream ois) throws IOException, ClassNotFoundException {
ois.defaultReadObject();
}*/


public String toString() {
return "Employee name is: "+this.name+" and his designation is: "+this.designation+" his id is: "+this.empNo+" " +
"and his department is: "+this.department;
}
}
public class SerializedEmployee {

public static void main(String[] args) {
try {
FileOutputStream fos = new FileOutputStream("emp.ser");
ObjectOutputStream oos = new ObjectOutputStream(fos);
Employee employee = new Employee("Varun",123,"SE","Java");
oos.writeObject(employee);
System.out.println("Object saved to the file");
FileInputStream fis = new FileInputStream("emp.ser");
ObjectInputStream ois = new ObjectInputStream(fis);
Employee emp = (Employee) ois.readObject();
System.out.println("Object read is: ");
System.out.println(emp);
} catch (FileNotFoundException e) {
System.out.println("FileNotFoundException");
} catch (IOException e) {
System.out.println("IOException");
} catch (ClassNotFoundException e) {
System.out.println("ClassNotFoundException");
}
}
}


From the above example, we see each rule is marked in the same color in the code as the one it is explained under. i.e.
Rule# 1: Employee class is a persistent class as it is implementing Serializable
Rule #2: the 'department' field is marked as transient. So, this field wont be persistent. This can be seen when we called the writeObject and then subsequent call to readObject prints null value for the department field. Try to run the above example and you will see that 'though, we have passed 'Java' as the employee's department field, department' will be shown as 'null' when we print that employee object.
Rule #3: We have declared a 'serialVersionUID' for the class. This helps us in Versioning control which is explained below:
Versioning:
Imagine you create a class, instantiate it, and write it out to an object stream. That flattened object sits in the file system for some time. Meanwhile, you update the class file, perhaps adding a new field. What happens when you try to read in the flattened object?
Well, the bad news is that an exception will be thrown -- specifically, the java.io.InvalidClassException -- because all persistent-capable classes are automatically given a unique identifier. If the identifier of the class does not equal the identifier of the flattened object, the exception will be thrown. However, if you really think about it, why should it be thrown just because I added a field? Couldn't the field just be set to its default value and then written out next time?

Yes, but it takes a little code manipulation. The identifier that is part of all classes is maintained in a field called serialVersionUID. If you wish to control versioning, you simply have to provide the serialVersionUID field manually and ensure it is always the same, no matter what changes you make to the classfile.

For trying to know it better, just run the above example once. Then, comment the part that saves or writes the employee object, change the serialVersionUID to 2L or anything other than 1L (or) perform any change from the bleow list that can adversely affect the saved object.

The Sun documentation lists the various class format changes that can adversely affect the restoration of an object (if the serialVersionUID is changed or not provided). A few of these include:

1 Deleting a field, or changing it from non-static or non-transient to static or transient, respectively.
2 Changing the position of classes in a hierarchy.
3 Changing the data type of a primitive field.



You will notice that you get an exception that states:
"IOException--->java.io.InvalidClassException: Employee; local class incompatible: stream classdesc serialVersionUID = 1L, local class serialVersionUID = 2L".

On the other hand, not every change will have a negative effect. Here are some changes to class versions that do not have a detrimental effect on object behavior:

1 Adding fields, which will result in default values (based on data type) being assigned to the new fields upon restoration.
2 Adding classes will still allow an object of the added class to be created, since the class structure information is included in the stream. However, its fields will be set to the default values.
3 Changing the access modifier (public, private, etc.) for a field, since it is still possible to assign a value to the field.
4 Changing a field from static or transient to to non-static or non-transient, respectively.



Rule #4: We have created the employee object. We want to store the object into a File. So, we created the File using FileOutputStream and then created an ObjectOutputStream with the reference of fileoutput stream. Finally called the writeObject method of the stream by passing the employee object. Here, it saves all the fields of the employee object except those which are marked transient (as explained in rule#2) and those will be saved or marshalled as sequence of bytes.
Similarly, We retreive the marshalled sequence of bytes of the employee object by calling the readObject method and casting it back to employee object.

Rule #5: Say, we have a scenario where we want to process some values just before it is been saved. In this case, we can override the writeObject method. So, from our example, remove the comments for the writeObject and readObject method of the exmployee class. In the writeObject, we have performed a logic which cheks for the designation code and updates the proper designation value. After this is done, we call the defaultWriteObject() that is the method which will be called by writeObject if we dont override it. Similarly, we can even perform some logic while unmarshalling (reading) the object in the readObject method. We did nothing here but just called the defaultReadObject method.

NOTE: If we have a class in a inheritance hierarchy such that the superclass implements serializable, but we dont want the subclass to be serialized, then the only way we can acheive it is by overriding the writeObject and readObject methods in subclass with their exact signatures and throwing "NotSerializableException" from those two methods.

1 comment: