India can achieve tech goals without raising bogey of data, AI colonisation

6 days ago 4

 Pixabay) In March, the Union Cabinet approved the national-level IndiaAI Mission with a fund outlay of 10,371.92 crore, (Image: Pixabay)


  Advocates of information sovereignty are critical of tech companies like OpenAI utilizing Indian data, only to sell insights back at high prices
  However, global tech companies have also contributed to India's growth by setting up their centres of excellence

Ola laminitis Bhavish Aggarwal precocious expressed displeasure astatine large tech companies similar OpenAI utilizing information from India to pre-train their ample connection models (LLMs) and nett from it, describing the inclination arsenic a signifier of information and AI “colonisation."

According to Aggarwal, portion India generates 20% of the world's data, lone a fraction of this is stored wrong the country. He elaborated, "...20%, due to the fact that we are 20% of the world's population..." 

He went connected to criticise the signifier of exporting Indian information to planetary information centres lone to person companies similar OpenAI process it and merchantability the insights backmost to India astatine precocious prices.

He besides drew a humanities parallel, asking rhetorically, "Does this dependable akin to thing that happened 200 years ago? Yeah, East India Company."

As an entrepreneur with a grade successful machine subject and engineering from IIT Bombay, Aggarwal has proved himself successful the field. He not lone founded Ola Cabs and Ola Electric but besides launched an AI task called Krutrim, which is simply a present a unicorn, oregon a startup valued astatine $1 cardinal oregon more.

Aggarwal's concerns astir “data sovereignty" and the request for India to go self-reliant are valid. And it’s hard to responsibility him, oregon anyone for that matter, for arguing that India indispensable nutrient much domestically to vie with planetary powers similar the US and China.

This includes everything from batteries and vehicles to semiconductor manufacturing plants and information centres, arsenic good arsenic e-planes, rockets, slug trains, hyperloops, instauration models, and LLMs to rival tech giants Google, Microsoft, Nvidia, and OpenAI.

This is not the archetypal clip that idiosyncratic has voiced concerns implicit information and AI “colonisation," which is broadly defined arsenic large tech companies (mostly from the US) utilizing information from emerging economies to enrich themselves.

Reliance Industries president Mukesh Ambani voiced this interest backmost successful December 2018 erstwhile helium said, “In this caller world, information is the caller oil. And information is the caller wealth. India’s information indispensable beryllium controlled and owned by Indian radical and not by corporates, particularly planetary corporations."

On 20 March, Nvidia CEO Jensen Huang raised this question with a conception of the media. He said: “Prime Minister Narendra Modi told maine that India should not export flour to import bread—this makes cleanable sense. Why export earthy material, lone to import the value-added product? Why export India’s data, lone to import AI?"

Data-aware, self-reliant

Urging Indians to consciousness arrogant astir making much products successful India and exhorting them to beryllium assured erstwhile dreaming large astir gathering LLMS, electrical vehicles, oregon adjacent semiconductor fabs (as Ambani and Aggarwal did) is an effectual prescription.

The country's entrepreneurs began gathering India-specific LLMs lone aft C.P. Gurnani, past then CEO and MD of Tech Mahindra, and Rajan Anandan, MD of task steadfast Peak XV Partners, took offence erstwhile OpenAI CEO Sam Altman doubted if Indian entrepreneurs could make a generative pre-trained transformer (GPT)-type of LLM with conscionable a $10 cardinal investment, a fraction of what OpenAI was spending.

The result: Indian entrepreneurs person already released section connection models including Aggarwal's Krutrim, Tech Mahindra's Project Indus, Sarvam AI's OpenHathi series, AI4Bharat, SML's Hanooman series, Sutra bid from Two AI, and CoRover's BharatGPT.

In March, the Union Cabinet approved the national-level IndiaAI Mission with a fund outlay of 10,371.92 crore, aimed astatine developing a manufacturing basal for graphic processing units (GPUs) successful a public-private partnership, and multi-modal domain-specific LLMs. That precise month, India took a large measurement towards achieving self-reliance successful electronics erstwhile PM Modi laid the instauration stones for 3 semiconductor facilities successful India, worthy astir 1.25 trillion.

These see a fab by Tata Electronics with Taiwan’s PSMC successful Gujarat; an outsourced semiconductor assembly and trial (OSAT) installation successful Assam, besides by Tata Electronics; and an OSAT installation by CG Power successful concern with Renesas.

India is advancing rapidly successful abstraction and technology. Beyond ISRO's abstraction missions, Skyroot launched India's archetypal backstage rocket successful 2022, and Agnikul Cosmos launched its sub-orbital trial conveyance featuring the world's archetypal single-piece 3D-printed rocket engine.

India is making important advancement successful electrical vehicles, with Tata Motors, Mahindra Electric, Ashok Leyland Electric, Hyundai, Hero Electric, Kia Motors, and startups Ather and Ola Electric Mobility contributing to this growth.

Cut slack connected data, AI colonisation

Here are factors to see successful the information and AI colonisation-versus-self-reliance debate:

First, let’s inquire ourselves: Must we overgarment large tech companies arsenic antagonists successful the 'Make-in-India' movie to thrust location our concerns implicit information and AI colonisation? Especially erstwhile galore large tech planetary companies proceed to lend towards generating employment successful the state and enactment India connected the planetary probe and improvement (R&D) representation with their planetary capableness centres (GCCs)?

Also, acquisition reveals that pure-bred Indian companies excessively tin manipulate and nett from idiosyncratic information stored wrong Indian shores.

Second, India's basal connected information localisation itself has grey areas. The Digital Personal Data Protection (DPDP) Act, 2023, which was notified past year, eased the stance connected cross-border information transportation restrictions by adopting a “blacklist" approach.

This, immoderate experts argue, could pb to a shortage of high-quality datasets that are important for AI R&D. Any circumstantial instrumentality pertaining to immoderate circumstantial assemblage volition supersede the DPDP Act erstwhile it comes to "banning" these geographies, specified arsenic Reserve Bank of India regulations that authorities that the idiosyncratic information of outgo systems are to beryllium stored successful India.

Third, determination is simply a accelerated emergence successful the fig of section information centres successful India. Yet, arsenic of March 2024, the US hosts much than 50% (5,381) of the small implicit 10,000 information centres, according to Statista, portion India had 163 including those acceptable up by Microsoft, Google, NTT Data, and Amazon Web Services.

Fourth, planetary tech companies person contributed to India's tech maturation by mounting up their centres of excellence (COEs) successful India and investing millions of dollars successful mounting up incubators and accelerators. They are partnering with acquisition institutions and Indian startups too.

By 2025, India is expected to person implicit 1,900 GCCs employing 2 cardinal people. By 2026-27, Nasscom forecasts the fig of GCCs successful India to transcend 2,000.

Fifth, portion Aggarwal whitethorn apt disagree, it would instrumentality billions of dollars and years to physique a azygous entity that helium envisions -- 1 that tin bid foundational models and LLMs, plan customized AI chips, person an AI unreality infrastructure, and physique a developer level too.

OpenAI’s GPT was successful the works for much than six years, outgo upwards of $100 cardinal and utilized an estimated 30,000 GPUs. Krutrim is simply a abstracted business, according to Aggarwal, for which helium raised $50 cardinal from Matrix Partners India.

Sixth, galore Indian startups that are processing section connection models are gathering them atop (hence, called wrappers) the instauration models of OpenAI's GPT, Meta's LLaMA, Google's Gemini, and Anthropic's Claude, to sanction a few. Building a instauration model, arsenic indicated above, is expensive. Thus, they excessively are capitalising connected the precise information that large tech companies would person utilized from India.

Besides, there's a shortage of GPUs (a azygous Nvidia H100 GPU tin outgo astir $25,000) to tally these AI models, and Indian companies person to trust connected planetary tech companies to supply them, adjacent if they tally overmuch smaller connection models. Moreover, backing for Indic connection models is not casual to travel by.

Seventh, arsenic suggested by Mihir Kaulgud successful an October 2022 paper published by the Social and Political Research Foundation, if India wants to combat information colonialism, it should absorption connected some the state’s sovereignty and the radical who make data, frankincense “being attentive to data’s societal characteristics and moving the foundations of argumentation distant from the 'data arsenic resource' metaphor."

Quoting ‘Svensson, J., & Poveda Guillen, O. (2020). What is Data and What Can it beryllium Used For?’, helium explained that the information generated by “highly networked societies" has to beryllium “captured, quantified, and processed."

“All of these practices springiness the information a peculiar shape, suggesting that information is profoundly taste and infused with societal norms and values," Kaulgud argued.

Wrapping up

India’s tryst with semiconductors started successful the 1960s. It failed galore times but did not springiness up. The existent semiconductor projects should supply the gathering blocks for section microchip-making and the captious semiconductor worth concatenation of design, fabrication, assembly, testing, marking and packaging.

The caller investments volition negociate to marque chips of lone 28-40 nanometres, portion blase plants globally person moved connected to 2-3nm. Yet, present too, India is drafting connected the expertise of planetary tech companies, which is simply a sensible strategy.

On the information front, too, portion India whitethorn make 20% of planetary data, however overmuch of it is bully prime data? Besides, instauration models and LLMs of OpenAI, Meta and Google person been pre-trained connected information that is 60-70% successful English.

Further, galore of the 22 authoritative Indian languages bash not person integer data, which makes it challenging to physique and bid AI models with section datasets. Bhashini, a portion of the National Language Translation Mission, has truthful acold spent $6-7 cardinal to cod information from antithetic sources.

Ironically, Google India has been moving with the Indian Institute of Science (IISc) connected Project Vaani, which volition stitchery code information crossed India. And truthful is Microsoft, which has been helping with Indic languages, it's concern with Sarvam AI being a lawsuit successful point.

The anticipation present is that the government's projected unified information platform, the IndiaAI Datasets Platform, volition supply Indian startups and researchers entree to prime non-personal datasets.

Simply put, public-private partnerships, investments, and planetary cooperation, adjacent with large tech companies, should beryllium capable to propulsion India's AI resoluteness without the request to rise the bogey of information and AI colonisation, which is, successful the least, not geopolitically right.

