Content transformation with XML output - Our First Customer

The Client: India's Leading Law Publisher 

The client is a leading publisher who is involved in the printing and publishing of wide range of various legal commentaries and judgements, Articles, journals, digests,  law reports and digests in the print and electronic medium.books. Such reports and journals are used by lawyers and judges for reference when quoting similar cases as precedence.

The requirement of the project was to digitize archived case reports i.e. reports that were available in printed form were to be made available for reference in the electronic format. The information within these reports was to be made available with predefined ‘tags’ to facilitate their display and use in searchable form.

Solution Design & Deployment: 
The learning needs for the project were studied and our training and operations team jointly developed a detailed curriculum that enabled our learners to quickly come up to speed to be able to provide the required solution to address the client’s requirements.
Fig1 - Workflow
The job is received from the client in the form of scanned image files on a designated secure FTP site. Once the contents from the images have been electronically extracted using OCR software, the same is proofread against the image of the original by the operator and then further processed by tagging the text with respect to specifications defined in the DTD file.  Typical tags include case title, judgment date, judge name, advocate name, etc. The final output is delivered to the client in the XML format.

Key Performance parameters:
The client has long established credibility in their business and is well-known for the accuracy of the content that they make available. As might be expected, accuracy – of both text and tag - was identified as the single most important parameter for this process.
Text accuracy is measured based on the number of text errors and has a Service Level of 99.95% - this implies that for a typical file having 8 pages and around 19000 characters, the margin for error is less than 9 characters in a file.

Tag accuracy is computed based on incorrect or omitted tags and has a Service Level of 99% - this implies that for a typical file that has an average of 150 tags, the margin for error is no more than 1 tagging error in the file.
To ensure that service levels are met, multiple levels of quality checks are built into the process – including self-checks, QC by a Quality Executive and audits on the final output prior to upload. We are currently meeting and exceeding the required service levels on these parameters.

Fig2: Visual Output
Business Benefit:
Creation of a digital archive usable as a pay-for-use service by clients’ end-customers
Enables information access based on key search criteria. High quality output generated by a cost-effective solution

Comments