Building Your First Neural Network with TensorFlow

A Step-by-Step Guide to Population Prediction

TensorFlow Logo

Introduction

Neural networks might seem intimidating at first, but with modern frameworks like TensorFlow, they're more accessible than ever. In this tutorial, we'll build a simple neural network to predict a country's population based on its capital city's latitude and land area. This is a perfect first project to understand the basics of neural networks and TensorFlow.

Setting Up Your Environment

First, let's install the required packages. We'll need TensorFlow and NumPy:


pip install tensorflow numpy pandas

            

Preparing Our Data

Let's create a simple dataset for our example. In a real project, you'd likely load this from a CSV file or database:


import numpy as np
import tensorflow as tf
import pandas as pd

# Sample data (you can expand this)
data = {
    'country': ['Iceland', 'UK', 'France', 'Egypt', 'Kenya', 'South Africa'],
    'latitude': [64.1, 51.5, 48.9, 30.0, -1.3, -30.6],
    'area_km2': [103000, 242500, 551695, 1001450, 580367, 1221037],
    'population': [364134, 67220000, 67390000, 104000000, 54985698, 59308690]
}

df = pd.DataFrame(data)

# Normalise the data (very important for neural networks!)
latitude_mean = df['latitude'].mean()
latitude_std = df['latitude'].std()
area_mean = df['area_km2'].mean()
area_std = df['area_km2'].std()
pop_mean = df['population'].mean()
pop_std = df['population'].std()

X = np.column_stack([
    (df['latitude'] - latitude_mean) / latitude_std,
    (df['area_km2'] - area_mean) / area_std
])
y = (df['population'] - pop_mean) / pop_std

            

What's all this normalisation stuff about?

Think of normalization like converting different measurements to the same scale. Imagine you're cooking and have one recipe that uses cups and another that uses grams - it's hard to compare them directly! Normalization is like converting everything to the same unit so we can work with them easily.

In our neural network, we're dealing with very different numbers:

  • Latitudes range roughly from -90 to +90
  • Land areas might be millions of square kilometers
  • Populations could be in billions

This creates two potential problems:

  1. The neural network might think bigger numbers are more important just because they're bigger
  2. These vastly different scales can make the learning process unstable and slow

The solution is to convert all our numbers into roughly the same range, typically centering them around zero. We do this using a technique called "standardization" (also known as z-score normalization), which involves two steps:

  1. First, we find the "mean" (average) of each feature to center the data
  2. Then, we find the "standard deviation" (a measure of how spread out the numbers are) to scale the data

The formula we're using to do this is:

normalized_value = (original_value - mean) / standard_deviation

After this somewhat clever trick, for each feature:

  • Values close to the mean become close to zero
  • Values one standard deviation above the mean become close to 1
  • Values one standard deviation below the mean become close to -1

So when we write:


(df['latitude'] - latitude_mean) / latitude_std

            

We're taking each latitude, subtracting the average latitude (to center it around zero), and then dividing by the standard deviation (to scale it).

It's like telling the neural network: "Don't worry about the actual numbers, just focus on how many standard deviations away from the mean each value is."

For example:

  • A latitude of 64.1°N (Iceland) might become 1.2 (meaning it's 1.2 standard deviations above the mean latitude)
  • An area of 103,000 km² might become -0.8 (meaning it's 0.8 standard deviations below the mean area)

The normalisation is crucial because it helps the neural network learn faster, prevents larger-scale features from dominating smaller ones, makes the training process more stable and helps the optimization algorithm converge more reliably. Think of it like leveling the playing field - we're making sure the neural network pays equal attention to all our input features, regardless of their original scale.

Building the Neural Network

Now comes the exciting part! We'll create a neural network using TensorFlow's Keras API. Our network will have:

  • An input layer with 2 inputs (latitude and area)
  • A hidden layer with 5 neurons
  • An output layer with 1 neuron (predicted population)

# Create the model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(5, activation='relu', input_shape=(2,)),
    tf.keras.layers.Dense(1)
])

# Compile the model
model.compile(
    optimizer='adam',
    loss='mean_squared_error'
)

# Display the model structure
model.summary()

            

Understanding What's Happening

Let's break down what each part does:

1. The Model Structure

When we create our sequential model, we're building a neural network where each layer feeds into the next layer in sequence. The first Dense layer has:

  • 5 neurons (the first parameter)
  • 'relu' activation (Rectified Linear Unit - turns negative values to zero)
  • input_shape=(2,) because we have two input features

2. The Compilation Step

When we compile the model, we're:

  • Using the 'adam' optimizer - a sophisticated gradient descent algorithm
  • Using 'mean_squared_error' as our loss function - perfect for regression problems like this one

Training the Model

Now we can train our model on the data:


# Train the model
history = model.fit(
    X, y,
    epochs=200,
    verbose=1
)

# Plot the training progress
import matplotlib.pyplot as plt

plt.plot(history.history['loss'])
plt.title('Model Loss During Training')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.show()

            

Note that as a general convention a capital X refers to the features (in our case the data about the countries) and the lowercase y refers to the thing we are trying to predict (in our case the population)

Making Predictions

Let's test our model with some new data:


# Function to make predictions
def predict_population(latitude, area_km2):
    # Normalize the input
    lat_norm = (latitude - latitude_mean) / latitude_std
    area_norm = (area_km2 - area_mean) / area_std

    # Make prediction
    pred_norm = model.predict([[lat_norm, area_norm]])[0][0]

    # Denormalize the prediction
    prediction = (pred_norm * pop_std) + pop_mean

    return prediction

# Test with a new country
test_latitude = 40.4  # Madrid, Spain
test_area = 505990   # Spain's area in km²

predicted_population = predict_population(test_latitude, test_area)
print(f"Predicted population: {predicted_population:,.0f}")

            

Understanding the Results

Our model is making predictions based on patterns it learned from our training data. It's important to note that:

  • The predictions won't be perfect - many other factors influence a country's population
  • More training data would likely improve accuracy
  • The model learns correlations, not causation

Ways to Improve the Model

There are several ways we could enhance this basic model:

  • Add more input features (GDP, coastline length, etc.)
  • Increase the number of hidden layers
  • Adjust the number of neurons per layer
  • Try different activation functions
  • Use regularization to prevent overfitting (Regularization is like adding a "penalty" to your model to keep it simpler and more generalized. Think of it as encouraging the model to use all of its features moderately rather than relying too heavily on any particular ones.)
  • Collect more training data

Conclusion

We've built a simple but functional neural network using TensorFlow! While this example might seem basic, it demonstrates the fundamental concepts of:

  • Data preprocessing and normalization
  • Model architecture design
  • Training process
  • Making predictions with the trained model

These same principles apply to much more complex neural networks used in real-world applications. The main differences would be the amount of data, the number of features, and the complexity of the network architecture.